2008-08-07 05:07 -!- tux3bot(~tux3bot@yzf.shapor.com) has joined #tux3 2008-08-07 05:56 http://web.archive.org/web/20060904185736/http://www.complang.tuwien.ac.at/anton/lfs/ 2008-08-07 05:57 similar thought process on versioning 2008-08-07 06:56 wow- that's pretty cool. 2008-08-07 06:57 very similar method of managing free space 2008-08-07 07:01 -!- pgquiles(~pgquiles@224.Red-81-39-154.dynamicIP.rima-tde.net) has joined #tux3 2008-08-07 09:23 more purdy: http://shapor.com/tux3/ 2008-08-07 09:50 bitchin' 2008-08-07 09:58 that really looks nice shapor 2008-08-07 09:58 is this now the "official site" or the mirror site? 2008-08-07 09:59 and wtf on zumastor? 2008-08-07 10:07 wtf? 2008-08-07 10:11 wtf wtf i mean 2008-08-07 10:11 :-0 2008-08-07 10:12 i'm thinking of organizing daniels mailing list posts in to a design document 2008-08-07 10:12 morning 2008-08-07 10:16 morning flips 2008-08-07 10:28 shapor, that would be most wonderful 2008-08-07 10:29 u guys heard of MogileFS? 2008-08-07 10:29 http://www.danga.com/mogilefs/ 2008-08-07 10:29 never 2008-08-07 10:29 its a selfish effort, to absorb it all ;) 2008-08-07 10:29 distributed fs that allows you to run any type of fs locally 2008-08-07 10:30 Application level -- no special kernel modules required. 2008-08-07 10:30 veoh networks uses it. was developed for live journal 2008-08-07 10:30 sounds like not a filesystem really 2008-08-07 10:30 actually sounds a lot like google's gfs 2008-08-07 10:31 MogileFS is not: 2008-08-07 10:31 * POSIX Compliant 2008-08-07 10:31 kerneltrap picked up the matt dillon dialogue 2008-08-07 10:31 so yeah, much like google's gfs, not a real filesystem ;) 2008-08-07 10:31 useful thing maybe though 2008-08-07 10:32 who knows 2008-08-07 10:32 cloud? 2008-08-07 10:32 its vrey useful that its open source 2008-08-07 10:32 yeah, saw the *not* posix note 2008-08-07 10:32 like hadoop and memcached 2008-08-07 10:32 nice tools for building clusters for specific applications 2008-08-07 10:33 web 2.0 stuff 2008-08-07 10:33 speaking of which 2008-08-07 10:33 danga = livejournal guy 2008-08-07 10:33 someone at a startup i was talking to yesterday 2008-08-07 10:33 is running redhat's gfs 2008-08-07 10:33 heh 2008-08-07 10:33 fun? 2008-08-07 10:33 in a backend cluster 2008-08-07 10:33 and it keeps crashing 2008-08-07 10:33 when load gets high 2008-08-07 10:33 cascading failure 2008-08-07 10:33 all nodes die 2008-08-07 10:33 never woulda thunkit 2008-08-07 10:34 i'm lurking on the hadoop irc now 2008-08-07 10:34 that's how i heard of mogilefs 2008-08-07 10:35 the lfs article is a good read 2008-08-07 10:35 thinking way back, it was _really_ dumb of me to try to put the free space bitmaps inside the snapshot 2008-08-07 10:36 nobody questioned it at the time 2008-08-07 10:36 good thing you can question your own work, huh? 2008-08-07 10:37 if you live long enough 2008-08-07 10:37 I'm thinking about that because the lfs guy seems to be busy making the same mistake 2008-08-07 10:38 "At each snapshot we have a set of free blocks (and complementary, a set of allocated blocks) for the files reachable through this snapshot." 2008-08-07 10:39 btrfs uses per-block refcounts for that (ouch) and zfs uses dead block lists (complexity and weirdness) 2008-08-07 10:39 tux3 just has a conventional allocator and can figure out what blocks can actually be freed by looking at its version information 2008-08-07 10:40 this is a huge advantage to keeping all the version information for a given block together in one place 2008-08-07 10:40 flips: i found that site linked off comments on lwn from last year 2008-08-07 10:40 related to zumastor i think 2008-08-07 10:40 ah 2008-08-07 10:41 http://lwn.net/Articles/170346/ 2008-08-07 10:41 hm thats not the one 2008-08-07 10:42 http://lwn.net/Articles/239369/ 2008-08-07 10:42 that one 2008-08-07 10:42 ah yeah stumbled across it reading about gplv3 2008-08-07 10:43 that comment is not very accurate re wafl 2008-08-07 10:44 I like linus's comment 2008-08-07 10:44 "Umm. You are making the fundamental mistake of thinking that Sun is in 2008-08-07 10:44 this to actually further some open-source agenda." 2008-08-07 10:46 remember scott mcneally wearing the penguin suit? 2008-08-07 10:51 lol that was a long time ago 2008-08-07 10:51 seems like only yestaday 2008-08-07 10:55 see "Free-space management and clones" in http://www.complang.tuwien.ac.at/anton/lfs/ and you will see diagrams much like versioned pointers 2008-08-07 10:55 yes! 2008-08-07 10:56 but he didn't go on to realize you could actually represent the version data that way 2008-08-07 10:56 only has rather similar freeable block algorithm 2008-08-07 10:57 also, you don't need .killed, on .born 2008-08-07 10:57 only .born 2008-08-07 10:57 you should email him 2008-08-07 10:57 invite him to the list 2008-08-07 10:57 sure 2008-08-07 10:57 he is a professor i believe 2008-08-07 10:57 he was talking about potentialy putting some students on lfs 2008-08-07 10:57 might be useful for tux3 2008-08-07 10:58 very pretty site shapor 2008-08-07 10:58 hand coded html? 2008-08-07 10:58 looks like you speak his language 2008-08-07 10:58 I think so 2008-08-07 10:58 yes I will email 2008-08-07 10:58 fartenpoopenspakenziepissin 2008-08-07 10:58 :-) 2008-08-07 10:58 genau 2008-08-07 10:59 yes hand coded html 2008-08-07 10:59 looked like crap until i added the css 2008-08-07 10:59 its very simple just 1996-ish h1, h2, and li tags 2008-08-07 10:59 I have issues with this post http://www.complang.tuwien.ac.at/anton/memory-wall.html 2008-08-07 11:00 I think its a few years old 2008-08-07 11:01 there's no such thing as an infinitely fast CPU 2008-08-07 11:52 mind if I plagiarize your html? 2008-08-07 11:53 tim_dimm, what about the disk wall? 2008-08-07 11:53 or more properly, disk chasm 2008-08-07 12:08 disk wall? you mean wall-o-disk ? 2008-08-07 12:09 the big leap? the humongoid gap? 2008-08-07 12:49 was playing lego indiana jones last night, why I thought about chasms 2008-08-07 13:32 why: 2008-08-07 13:33 -if (!tuxread(&sb, 6, buf, 11)) 2008-08-07 13:33 +if (tuxread(&sb, 6, buf, 11)) 2008-08-07 13:34 wow 711 diggs! 2008-08-07 13:36 "But will Tux3 be the ReiserFS killer?" 2008-08-07 13:36 "The ReiserFS murderer" 2008-08-07 13:36 lol 2008-08-07 13:37 "Are you saying that Tux3 will be made from the blood, sweat and horror filled tears of the developers Wives..........probably :P" 2008-08-07 13:37 http://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices <- this is really helping me think about lvm3 2008-08-07 13:37 url? 2008-08-07 13:37 linked from http://shapor.com/tux3/ 2008-08-07 13:37 the "digg" link at the bottom 2008-08-07 13:38 ah, I looked at that line lots of times without noticing the digg link 2008-08-07 13:39 the reiser jokes seem to keep the conversation going 2008-08-07 13:39 70 comments 2008-08-07 13:39 http://oss.oracle.com/projects/btrfs/dist/documentation/btrfs-volumes.html <- actually, this is the interesting one 2008-08-07 13:39 most of the good ones re "below the threshold" 2008-08-07 13:39 heh 2008-08-07 13:44 wow. css is a completely different syntax than html/xml 2008-08-07 13:48 yes it is metadata for the html to use 2008-08-07 13:48 the one i put up is in no way minimalistic 2008-08-07 13:49 it is originally from doxygen 2008-08-07 13:49 so it has lots of crap in it 2008-08-07 13:49 could be trimmed down to 10 lines to do the same thing 2008-08-07 13:54 mistake in your html 2008-08-07 13:55 2008-08-07 13:55 looking forward to the 10 line version 2008-08-07 13:56 i wasn't planning on making it 10 lines 2008-08-07 13:57 its also probably not w3c compliant 2008-08-07 13:58 http://validator.w3.org/check?uri=http%3A%2F%2Fshapor.com%2Ftux3 2008-08-07 14:01 flips: i just quickly pulled all the css out for classes 2008-08-07 14:01 since i dont set any, they never get used 2008-08-07 14:01 so it is equivalent 2008-08-07 14:09 created a mercurial tree for tux3 kernel 2008-08-07 14:17 linus knocked us off the top of the lkml.org hot list with his 2.6.27-rc2 announcement 2008-08-07 14:17 now #2 2008-08-07 14:33 heh 2008-08-07 15:17 only 2008-08-07 15:17 only #? 2008-08-07 15:17 wtf good is that? 2008-08-07 15:17 ;-) 2008-08-07 15:49 flips: bitbucket has a wiki feature, its a nicer way of storing notes/links than editing html 2008-08-07 15:49 best part is, its all tracked in a mercurial repo 2008-08-07 15:49 http://www.bitbucket.org/shapor/tux3/wiki/OtherFilesystems 2008-08-07 15:50 where can I download the source? 2008-08-07 15:50 http://www.wikicreole.org/ i think 2008-08-07 15:51 although i'm not sure what they are using to integrate with hg 2008-08-07 15:54 sudo apt-get install ikiwiki 2008-08-07 15:55 I like this: http://www.selenic.com/mercurial/wiki/ 2008-08-07 15:55 oh neat 2008-08-07 15:55 yeah thats better than using some 3rd party thing 2008-08-07 15:56 little more work to setup though 2008-08-07 15:56 no doubt 2008-08-07 15:56 there was also one with an issue tracker somewhere 2008-08-07 15:57 people seem to be using rcs as a data backend for more and more things these days 2008-08-07 15:57 it just makes sense 2008-08-07 15:58 however it is quite slow 2008-08-07 15:58 http://moinmoin.wikiwikiweb.de/MoinMoin 500 - Internal Server Error 2008-08-07 16:05 http://www.mantisbt.org/bugs/view_all_bug_page.php 2008-08-07 16:10 http://moinmo.in/ <- works 2008-08-07 16:13 tim_dimm, want to try the dcc again? 2008-08-07 16:13 I keep failing to notice the popup 2008-08-07 16:13 not that xchat dcc has ever worked for me 2008-08-07 16:13 hm 2008-08-07 16:13 call me then 2008-08-07 16:13 something abourt nat and xchat being lame 2008-08-07 16:57 -!- MaZe(~MaZe@64.173.151.3) has joined #tux3 2008-08-07 16:59 http://www.physorg.com/news137325794.html 2008-08-07 17:13 hey maze 2008-08-07 17:13 ok, let's consider a serious issue: how to get people to send us beer 2008-08-07 17:13 I'll put a link on the tux3.org page when we figure out what the link should say 2008-08-07 17:17 http://tux3.org/ <- send beer here 2008-08-07 17:33 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-07 17:52 struct btree_ops { 2008-08-07 17:52 int (*leaf_verify)(SB, void *leaf); 2008-08-07 17:52 }; 2008-08-07 17:52 struct btree_ops ftree_ops = { 2008-08-07 17:52 .leaf_verify = leaf_verify, 2008-08-07 17:52 }; 2008-08-07 17:53 btree.c on its way to crappy-c-style object oriented 2008-08-07 18:06 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-07 18:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-07 19:13 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-07 21:44 shapor, a starting point: http://tux3.org/design.html 2008-08-07 22:11 oh thats nice 2008-08-07 22:12 i'm going to be pretty busy the next couple days, probably wont get a chance to do much until sunday 2008-08-07 22:13 docs should go in vcs too though 2008-08-07 22:15 flips: http://lwn.net/Articles/234441/ 2008-08-07 22:16 mentions "versioned pointers" 2008-08-07 22:16 only to the root inode 2008-08-07 22:16 that was one of the few pre-existing hits on versioned pointers before my post 2008-08-07 22:17 now 10 out of first 10 google hits point at me and/or tux3 2008-08-07 22:18 hit number 22 is finally on some msft concept 2008-08-07 22:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-07 22:20 there are a lot of similarities coming from the logfs people 2008-08-07 22:20 but basically they are another tree versioned concept 2008-08-07 22:20 hamer is closer 2008-08-07 22:20 hammer 2008-08-07 22:20 see the beer link yet? 2008-08-07 22:20 i did 2008-08-07 22:20 so far no beer 2008-08-07 22:21 heheh 2008-08-07 22:21 nice 2008-08-07 22:21 yeah i saw it too ;) 2008-08-07 22:21 flips: only because you can't email beer ;) 2008-08-07 22:21 I don't have any beer here, only... 2008-08-07 22:21 Results 1 - 10 of about 308,000 for tux3. 2008-08-07 22:21 if only you could 2008-08-07 22:21 flips: i gave you one the other day!! 2008-08-07 22:21 I'm still subsisting on ramback beer 2008-08-07 22:21 so i'm the first beer contributor 2008-08-07 22:21 :-) 2008-08-07 22:21 true 2008-08-07 22:22 and what does that make me? 2008-08-07 22:22 although i think that may have hindered more than it helped ;) 2008-08-07 22:22 a contributor 2008-08-07 22:22 to juvenile delinquency 2008-08-07 22:22 I was pretty useless that night 2008-08-07 22:22 crashed just in front of my door 2008-08-07 22:22 good beer :-D 2008-08-07 22:22 hung out on 3rd st on the way back 2008-08-07 22:22 that was great 2008-08-07 22:22 hehe 2008-08-07 22:23 shapor, you caused me to almost break my wrist. you know that, right/ 2008-08-07 22:23 ? 2008-08-07 22:23 ACTION is a bad influence 2008-08-07 22:23 showing off for drunks 2008-08-07 22:23 that's not right 2008-08-07 22:23 juvenile delinquency is a bad influence on us respectable citizens 2008-08-07 22:25 "It's good that we have projects like this I think. Sun and ZFS really need some competition, and maybe soon we'll have no need to envy ZFS." 2008-08-07 22:25 on the zfs list? 2008-08-07 22:25 "do you have ZFS envy" -> tux3.org poll question 2008-08-07 22:26 nice 2008-08-07 22:26 nice hook 2008-08-07 22:26 tuxopen(&sb, 123, buf, sizeof(buf), 0); 2008-08-07 22:27 Segmentation fault 2008-08-07 22:28 left out the init_buffers and tux3_init 2008-08-07 22:28 new node 2008-08-07 22:28 new leaf blocksize = 4096 2008-08-07 22:28 root at 1 2008-08-07 22:28 leaf at 2 2008-08-07 22:28 Thu Aug 7 22:28:01 2008: [10463] probe: Failed assertion "(ops->leaf_sniff)(sb, buffer->data)" 2008-08-07 22:28 http://www.scdsource.com/article.php?id=296 2008-08-07 22:28 Trace/breakpoint trap 2008-08-07 22:29 Looks like sicortex interconnect design 2008-08-07 22:29 somebody should push forward the i/o pin tech 2008-08-07 22:30 would not be that hard 2008-08-07 22:30 3D stacking for pins 2008-08-07 22:30 micro pins 2008-08-07 22:30 get way more that way 2008-08-07 22:30 actually, 3D switches with pins on the ends of the switches 2008-08-07 22:31 pin->switch->pin 2008-08-07 22:31 you know pins are currently .1 mm right? how stupid is that? 2008-08-07 22:31 ball pens 2008-08-07 22:31 they could easily be 1 um 2008-08-07 22:31 yeah, its a problem for the memory channel too 2008-08-07 22:31 problem isn't so much the pin, its the link 2008-08-07 22:31 on the mobo 2008-08-07 22:31 problem for most packages 2008-08-07 22:32 tim_dimm: from http://www.linux.com/feed/142781 comments 2008-08-07 22:32 idiotic that it hasn't been fixed reasonably 2008-08-07 22:32 nano-technology to the rescue 2008-08-07 22:33 "slew-rate requirements" 2008-08-07 22:33 data skew 2008-08-07 22:34 flips: wheres tuxopen ? 2008-08-07 22:34 not checked in yet 2008-08-07 22:34 pukes 2008-08-07 22:34 ah thats why its hard to find 2008-08-07 22:34 coming in about .5 hr, depending on whether I get into another ramback beer or not 2008-08-07 22:35 will that make it take more or less time? ;) 2008-08-07 22:35 more 2008-08-07 22:35 you found the fleaf bug yet? 2008-08-07 22:36 i haven't had a chance to look yet 2008-08-07 22:36 busy with work all day 2008-08-07 22:36 that's sick 2008-08-07 22:36 well some of us have day jobs! 2008-08-07 22:36 :P 2008-08-07 22:40 new leaf blocksize = 4096 2008-08-07 22:40 root at 1 2008-08-07 22:40 leaf at 2 2008-08-07 22:40 inode at 123 2008-08-07 22:40 Thu Aug 7 22:39:52 2008: [10543] ileaf_lookup: Failed assertion "at < leaf->count" 2008-08-07 22:43 pardon my noobness- what does assert do? 2008-08-07 22:45 similar to what it means in english 2008-08-07 22:45 "abort if this is not true" 2008-08-07 22:46 just figured that out 2008-08-07 22:46 http://www.cplusplus.com/reference/clibrary/cassert/assert.html 2008-08-07 22:46 me learn to use the google 2008-08-07 22:46 teh google 2008-08-07 22:46 yeah, the correct answer is http://www.justfuckinggoogleit.com/ 2008-08-07 22:47 but its not very nice to send that link 2008-08-07 22:47 :P 2008-08-07 22:51 teh goggles 2008-08-07 22:53 beer goggles' 2008-08-07 22:55 lookup inode 123, 0 + 123 2008-08-07 22:55 0 inodes, 4084 free: 2008-08-07 22:55 release buffer for 1 2008-08-07 22:55 release buffer for 2 2008-08-07 22:55 Thu Aug 7 22:54:52 2008: [10710] tuxopen: no inode 123 2008-08-07 22:55 good 2008-08-07 22:55 now to set the create flag and create the inode 2008-08-07 22:58 I know you guys are going to be at it until the wee hrs 2008-08-07 22:58 whowever wrote the ddsnap warn macro needs to be hurt 2008-08-07 22:58 not me 2008-08-07 22:58 wasn't I 2008-08-07 22:58 gotta recover 2008-08-07 22:58 I've been getting up at 6am for contractors all week 2008-08-07 23:00 <-crashing 2008-08-07 23:00 ttyl 2008-08-07 23:02 creating the inode requires abstracting add_extent_to_tree to also be able to add an inode 2008-08-07 23:03 starting with renaming as add_entity_to_tree 2008-08-07 23:33 flips: not my bug :) 2008-08-07 23:34 ACTION does a happy dance 2008-08-07 23:36 :-) 2008-08-07 23:36 mine? 2008-08-07 23:39 actually, maybe mine 2008-08-07 23:39 you've got mail :) 2008-08-07 23:40 that was about 30 minutes of debugging 2008-08-07 23:41 most of which was just isolating the issue 2008-08-07 23:42 we should put a test in the main which tests all the boundary conditions 2008-08-07 23:42 good call on grouplim = 7 2008-08-07 23:42 would have taken us a lot longer to hit this otherwise 2008-08-07 23:43 yes 2008-08-07 23:43 use the same trick as I did in btree.c, redefine main as notmain, include in another file and go crazy 2008-08-07 23:44 there's a lot to be said for having a very simple single file sanity test as now, at this point 2008-08-07 23:44 when i have more time 2008-08-07 23:44 i can rest now knowing that one is fixed 2008-08-07 23:44 there's everntually going to be a generic fuzztester for all leaf methods 2008-08-07 23:44 i dont want to discover 5 more which require 30 min of debugging each ;) 2008-08-07 23:45 but knowledge of the specific boundary condiditions means you can write a better single purpose test 2008-08-07 23:45 not tonight anyway 2008-08-07 23:46 nice fix 2008-08-07 23:46 did that used to read entry == entries 2008-08-07 23:47 i think i might have changed it to !group->count when i added splitting 2008-08-07 23:47 i remember poking it before, might have just been in testing 2008-08-07 23:48 how do i go back to an old rev in hg 2008-08-07 23:48 test now succeeds indeed 2008-08-07 23:48 hg co 2008-08-07 23:49 woo 2008-08-07 23:49 unsigned limit = !group->count || entry == entries ? 0 : (entry + 1)->limit; 2008-08-07 23:49 *entry = (struct entry){ .loglo = loglo, .limit = limit }; 2008-08-07 23:58 yeah, see anything wrong? 2008-08-07 23:58 lgtm 2008-08-07 23:59 if you're worried about limit=0, don't be, you go through and increment all the limits by 1 starting at entry* 2008-08-08 00:00 btw it was your bug... hg annotate says rev 15 ;) 2008-08-08 01:22 Fri Aug 8 01:21:43 2008: [11736] tuxopen: no inode 123 2008-08-08 01:22 Fri Aug 8 01:21:43 2008: [11736] tuxopen: new inode 123 2008-08-08 01:22 expand inum 291 at 0/0 by 16 2008-08-08 01:22 :-) 2008-08-08 01:22 yes it was my bug 2008-08-08 01:22 subtle indeed 2008-08-08 01:23 have to check two boundary conditions 2008-08-08 01:30 it threw me for a loop at first because it happened right after a split 2008-08-08 01:30 so i scrutinized the split code, which of course is flawless :P 2008-08-08 01:50 Fri Aug 8 01:49:40 2008: [30550] leaf_insert: Failed assertion "tail >= 0" 2008-08-08 02:48 hmm 2008-08-08 07:50 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-08 09:09 flips: i made a new test.. that errors was happening due to a corrupt tree 2008-08-08 09:09 after split group with count 1 at 1 2008-08-08 09:10 due to one single entry having grouplim entries 2008-08-08 09:10 which would never happen if grouplim==255 and we limited snapshots to 255 2008-08-08 11:14 hi 2008-08-08 11:15 you mean the new assertion above or the one you just fixed? 2008-08-08 11:15 g'mornin 2008-08-08 11:15 new one 2008-08-08 11:15 that sounds good 2008-08-08 11:15 the thing i fixed is right 2008-08-08 11:15 it looked right 2008-08-08 11:16 last night i used the #define main trick 2008-08-08 11:16 and gave it a few stress tests 2008-08-08 11:16 that is the only way it fails 2008-08-08 11:16 when a group has one entry which has grouplim entries in it 2008-08-08 11:17 in other words, more than grouplim different versions 2008-08-08 11:17 well no 2008-08-08 11:18 er 2008-08-08 11:18 it fell out of cache now 2008-08-08 11:18 heh 2008-08-08 11:18 oh right 2008-08-08 11:18 its exactly what i said earlier 2008-08-08 11:19 i think 2008-08-08 11:19 ACTION looks for all the things shapor said earlier 2008-08-08 11:19 the problem is splitting a group of count 1 2008-08-08 11:19 obviously useless 2008-08-08 11:20 because it has grouplim entries 2008-08-08 11:20 yes, what I said I thought 2008-08-08 11:20 oh, yes, you're right 2008-08-08 11:20 we're both right just using different words 2008-08-08 11:20 good 2008-08-08 11:21 sorry i confused yself 2008-08-08 11:21 myself* 2008-08-08 11:21 one "fix" is worth 3 dozen "confuseds" 2008-08-08 11:21 you're checking in your test? 2008-08-08 11:23 sure 2008-08-08 11:32 short n sweet 2008-08-08 11:33 fleaf.c because dleaf.c, data leaf 2008-08-08 11:34 reason is, I want fleaf for the free tree leaf 2008-08-08 11:34 so: ileaf (inode) dleaf (file data) fleaf (free blocks) aleaf (atimes) 2008-08-08 11:35 oh, and directory leaf :-/ 2008-08-08 11:35 that is another dleaf 2008-08-08 11:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-08 11:35 ah, but that is hleaf 2008-08-08 11:36 dleaf could also be eleaf, extent leaf 2008-08-08 11:36 that is better I think 2008-08-08 11:37 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-08 11:37 maybe use 3 letters instead 2008-08-08 11:37 dirleaf 2008-08-08 11:37 that would fix it 2008-08-08 11:37 filleaf 2008-08-08 11:37 freleaf 2008-08-08 11:37 less confusing 2008-08-08 11:38 tlaleaf 2008-08-08 11:38 heh 2008-08-08 11:38 tnaleaf 2008-08-08 11:40 ileaf (inode table leaf) eleaf (extent tree leaf) fleaf (free extents leaf) aleaf (atime table leaf) dleaf (directory dirent block) 2008-08-08 11:41 can change our mind a few more times 2008-08-08 11:41 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-08 11:41 mercurial and git both lack metadata for tracking name changes 2008-08-08 11:41 suxors 2008-08-08 12:44 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-08 18:25 "Results 1 - 10 of about 308,000 for tux3" 2008-08-08 18:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-08 19:06 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-08 19:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-08 19:56 http://www.linkedin.com/pub/0/a57/305 2008-08-08 22:18 zfs design docs are really lame 2008-08-08 22:18 e.g.: http://opensolaris.org/os/community/zfs/structures/;jsessionid=9640F315CA43C6E0BED9C5DAEE2767D2 2008-08-08 22:49 ACTION gives up on finding any complete, coherent zfs design doc 2008-08-09 00:26 flips: ping 2008-08-09 01:07 shapor, pong 2008-08-09 01:15 maze, hi over here 2008-08-09 01:15 sure, no problem, I've been very busy - what about you? 2008-08-09 01:16 me too 2008-08-09 01:16 I've seen there's a lot of new email in my inbox about tux3 2008-08-09 01:16 and I haven't finished reading all of the old emails... ;-( 2008-08-09 01:16 is there enough tux3 code up now to get out of the "lame no code" category? 2008-08-09 01:16 (I just got back from a 'conference' like 2 days) 2008-08-09 01:17 there are a lot of emails to read 2008-08-09 01:17 particularly the matt dillon series 2008-08-09 01:17 http://kerneltrap.org/Linux/Comparing_HAMMER_And_Tux3 2008-08-09 01:17 depends, ultimately I guess until you have something that works as a filesystem (even if only as a userspace library)... 2008-08-09 01:17 that isn't even all of them 2008-08-09 01:17 it's close to that 2008-08-09 01:17 can open/make an inode now, read/write a file 2008-08-09 01:18 now need to open _then_ read/write 2008-08-09 01:18 and have multiple files 2008-08-09 01:18 not far away 2008-08-09 01:18 then directories... bitmap allocation... 2008-08-09 01:18 deletion... 2008-08-09 01:18 versions... 2008-08-09 01:18 ;-) 2008-08-09 01:19 atomic commit... 2008-08-09 01:19 one of the things that seemed to consistently crop up about filesystems at the meeting I was just at, was that a small amount of flash, to either have the superblock, or all the metadata (possibly including 'small' files) in flash is something desirable 2008-08-09 01:21 for tux3, being able to write the beginning of forward log chains to nvram wouild be nice 2008-08-09 01:21 only a few bytes needed 2008-08-09 01:21 but the commit logging strategy is going to run near media speed I think 2008-08-09 01:22 will get to that in a week or two 2008-08-09 01:22 ..fragmentation... 2008-08-09 01:22 right, there is at least a plan 2008-08-09 01:23 using generating functions to do a quadractic hash-like bounce to successively further away allocation goals 2008-08-09 01:23 meaning that when data does get bounced away from home, different updates get bounced to the same place 2008-08-09 01:23 right 2008-08-09 01:24 also, the logging strategy likes a certain amount of fragmentation 2008-08-09 01:24 this leaves places to store commit blocks at convenient places all over the volume 2008-08-09 01:24 shapor did some killer bug hunting 2008-08-09 01:25 right, I'll really have to read the design docs to get a better feel for all this 2008-08-09 01:26 design doc is essentially the lkml post converted to html 2008-08-09 01:26 shapor talked about adding some of the mailing list posts to it 2008-08-09 01:27 if he doesn't get around to it, I will add some 2008-08-09 01:27 yes, that would be much appreciated 2008-08-09 01:27 the generic btree thing came true 2008-08-09 01:27 works like a charm 2008-08-09 01:27 I haven't been able to be as involved as I would like to be 2008-08-09 01:28 generic as in? 2008-08-09 01:28 both for inodes and file content? 2008-08-09 01:28 as in the same btree code works for the inode table and file indexes 2008-08-09 01:28 and soon will also implement the free map 2008-08-09 01:28 yes 2008-08-09 01:28 and atime table, and directory indexes 2008-08-09 01:28 and I think I will add yet another table, a table of volume roots 2008-08-09 01:29 it's basically trivial to have multiple filesystems share the same allocation space 2008-08-09 01:29 do multiple filesystems really differ from directories in one root? 2008-08-09 01:29 yes, they don't invade each other's directories 2008-08-09 01:42 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 01:48 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 01:51 struct inode *inode = tuxopen(&sb, 0x123, 1); 2008-08-09 01:51 tuxwrite(inode, 6, "hello world", 11); 2008-08-09 01:51 if (tuxread(inode, 6, buf, 11)) 2008-08-09 01:51 return 1; 2008-08-09 01:51 hexdump(buf, 11); 2008-08-09 01:51 works 2008-08-09 01:51 "open by inum" 2008-08-09 01:51 need to make it open by name now 2008-08-09 01:52 6 is the blocknum to write it on... need to make that byte offset and update the size attribute properly 2008-08-09 02:00 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-09 02:05 -!- pgquiles_(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-09 02:14 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-09 02:27 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-09 02:35 -!- pgquiles_(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-09 03:42 my rip of ext2/dir.c compiles, I'll test it... later 2008-08-09 06:00 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 06:04 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 06:13 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-09 06:17 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 07:39 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 07:45 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 07:54 -!- pgquiles__(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-09 07:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 12:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 12:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 14:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 14:36 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 15:11 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 16:29 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-09 18:21 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 00:35 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 00:39 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-10 01:03 -!- pgquiles__(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 01:22 -!- pgquiles_(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-10 02:47 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-10 02:55 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 04:43 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 04:48 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 04:53 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-10 05:00 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-10 06:19 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-10 08:28 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 10:01 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 10:45 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 11:29 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-10 11:37 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-10 11:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 12:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-10 12:39 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-10 13:47 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-11 01:39 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-11 01:46 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-11 02:07 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-11 03:16 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-11 06:38 -!- Kirantpatil(~kiran@122.167.181.85) has joined #tux3 2008-08-11 06:39 Hello list 2008-08-11 06:40 i have few questions 2008-08-11 06:51 i am getting confuse which filesystem to use for storage 2008-08-11 06:51 tux3 or btrfs 2008-08-11 06:53 i have been watching Daniel's posts from zumastor and tux3 list 2008-08-11 07:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-11 07:32 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-11 08:07 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-11 08:37 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-11 08:42 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-11 09:36 -!- Kirantpatil(~kiran@122.167.181.85) has left #tux3 2008-08-11 10:15 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-11 13:21 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-11 13:24 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-11 13:46 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-11 15:06 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-11 15:11 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-11 16:56 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-11 22:20 -!- Kirantpatil(~kiran@122.167.211.229) has joined #tux3 2008-08-11 22:20 -!- Kirantpatil(~kiran@122.167.211.229) has left #tux3 2008-08-12 00:00 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 00:01 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 00:06 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-12 00:16 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 02:27 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-12 05:06 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 05:12 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 06:13 -!- Kirantpatil(~kiran@122.167.176.118) has joined #tux3 2008-08-12 06:21 -!- Kirantpatil(~kiran@122.167.176.118) has left #tux3 2008-08-12 07:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-12 07:09 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-12 08:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-12 13:09 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 13:42 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 13:51 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 13:54 http://www.smh.com.au/news/off-the-field/bills-blue-screen-of-death-malfunction/2008/08/12/1218306871673.html <- bsod at the olympics opening ceremony 2008-08-12 13:54 bill gates was apparently present 2008-08-12 14:10 yeah saw that ;) 2008-08-12 14:17 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 14:38 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 14:43 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 15:21 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-12 16:26 grunt. Just ported viro's ext2_readdir to tux3 2008-08-12 17:19 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-12 21:44 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-12 23:26 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-13 01:15 -!- pgquiles(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-13 01:22 -!- pgquiles_(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-13 01:26 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-13 01:32 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-13 02:06 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-13 03:28 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-13 03:40 -!- Kirantpatil(~kiran@122.167.219.1) has joined #tux3 2008-08-13 03:40 -!- Kirantpatil(~kiran@122.167.219.1) has left #tux3 2008-08-13 05:37 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-13 06:01 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-13 06:29 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-13 07:43 -!- pgquiles__(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-13 09:13 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-13 09:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-13 10:53 lwn linked the tux3 structure post 2008-08-13 10:53 got a google alert before the weekly edition is even posted 2008-08-13 10:54 that must mean that Jon creates the article pages already on the web before adding the top level link 2008-08-13 11:10 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-13 11:43 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-13 14:47 or it might mean that you traveled into the future and got the links. Well, did you? 2008-08-13 16:48 caught me ;-) 2008-08-13 16:48 ok, time to implement the allocation bitmaps now 2008-08-14 00:48 -!- pgquiles(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-14 00:54 -!- pgquiles_(~pgquiles@d515302CB.access.telenet.be) has joined #tux3 2008-08-14 00:59 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-08-14 01:15 -!- pgquiles_(~pgquiles@d515302D0.access.telenet.be) has joined #tux3 2008-08-14 11:48 -!- pgquiles(~pgquiles@d54C56A6E.access.telenet.be) has joined #tux3 2008-08-14 16:03 -!- boom(~boom@c-76-117-208-224.hsd1.nj.comcast.net) has joined #tux3 2008-08-14 16:04 Hello, was just reading about tux3 and it sounded interesting. 2008-08-14 16:39 boom: welcome 2008-08-14 16:56 shapor: Thanks :D 2008-08-14 16:56 I'm sorrry to report that I am not a developer and as such have little to offer but encouragement. 2008-08-14 17:04 once we have some complete code you'd certainly be welcome to test it and find bugs :) 2008-08-14 17:07 Sounds good. 2008-08-14 18:02 if you're not a developer you can send beer :-) 2008-08-14 20:35 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-14 22:41 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-15 09:52 -!- boom(~boom@c-76-117-208-224.hsd1.nj.comcast.net) has joined #tux3 2008-08-15 09:52 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-15 10:59 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 11:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-15 11:34 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 11:35 Instead of beer, I have an offer for a free month of netflix, anyone interested? 2008-08-15 11:37 hmm, beer helps when coding, movies might be distracting 2008-08-15 11:37 thanks for the offer though! 2008-08-15 11:37 flips might 2008-08-15 11:38 we already have netflix :-) 2008-08-15 11:39 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 11:42 Ah well. 2008-08-15 11:49 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 12:11 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 12:11 flips: ping 2008-08-15 12:15 hi pqguiles 2008-08-15 12:15 good to see you here :-) 2008-08-15 12:15 pgquiles I meant 2008-08-15 12:15 whoops 2008-08-15 12:15 got to do something about those typos 2008-08-15 12:15 :-) 2008-08-15 12:16 flips: do you know Strigi? ( http://strigi.sf.net ) 2008-08-15 12:16 hi pgquiles 2008-08-15 12:16 I didn't know it 2008-08-15 12:16 I'm at aKademy and this morning while having breakfast I was talking with Strigi's lead developer. He says he would be interested in adding indices to the filesystem, so I acted as a pointer to you 2008-08-15 12:17 pau = &flips; 2008-08-15 12:17 I would like to provide support in tux3 for indexing daemons 2008-08-15 12:17 hi shapor 2008-08-15 12:18 I would like to work out a system where the filesystem can notice an indexer accurately about such things as new hard links 2008-08-15 12:18 perfect! 2008-08-15 12:18 ah cool, a hdd crawler 2008-08-15 12:18 so i'm guessing it does its own io throttling 2008-08-15 12:19 i've written such a tool before for backups 2008-08-15 12:19 well minus the indexing 2008-08-15 12:19 shapor: it does cpu throtting and he's working on implementing io throtting 2008-08-15 12:19 iothrottling is key, esp with fast cpus today 2008-08-15 12:20 i thought they were adding some kernel features to make that easier 2008-08-15 12:20 the good thing about strigi is it comes by default with kde4 and it integrates with nepomuk (semantic desktop and all) 2008-08-15 12:20 pgquiles, he should come onto the tux3 mailing list and say what he would like 2008-08-15 12:20 flips: that's exactly what I told him :-) 2008-08-15 12:20 :-) 2008-08-15 12:22 he should be arriving to The Netherlands in a couple of hours, I guess he'll subscribe in a few days 2008-08-15 12:22 79 members on the mailing list now 2008-08-15 12:23 there's a lot of interest in tux3 2008-08-15 12:25 well I'd better make another checkin then 2008-08-15 12:25 bitmap allocation almost integrated 2008-08-15 12:43 shapor, got a 64 bit printf patch for me? Just needs (long long) cast for all parameters printed as %L that are not actually long long on 64 bit linux 2008-08-15 12:43 or less verbosely, (u64), will work just as well 2008-08-15 12:43 on my laptop at home 2008-08-15 12:44 :( 2008-08-15 12:44 come to think of it, we should make all those easily findable 2008-08-15 12:44 by typedeffing something for them 2008-08-15 12:44 (foo_t) 2008-08-15 12:45 (llcompat) 2008-08-15 12:45 something 2008-08-15 12:45 something short hopefully 2008-08-15 12:46 (fudge) 2008-08-15 12:46 typedef long long fudge, ok? 2008-08-15 12:52 casting to u64 doesn't work 2008-08-15 12:52 because u64 is long unsigned int on 64 2008-08-15 12:53 and Lx expects long long unsigned int 2008-08-15 12:53 there isn't a u64 fmt string unfortunately 2008-08-15 12:54 casting to long long on 64 is kinda silly, i guess that is 128 or something? 2008-08-15 12:55 that's ok though 2008-08-15 12:55 it's just a printf 2008-08-15 12:55 mostly tracing anyway 2008-08-15 12:56 right about the u64 2008-08-15 12:56 so typedef long long fudge 2008-08-15 12:56 no u64 2008-08-15 12:57 typedef long long widen; 2008-08-15 12:57 there, that looks halfway civilized 2008-08-15 12:58 then when somebody gets around to implementing type aware printing in C it's an easy splat edit 2008-08-15 12:58 that is, when hell freezes over ;-) 2008-08-15 12:59 would be more accurate to say, when tux3 gets ported to c++ 2008-08-15 12:59 which will also be when hell freezes over because c++ does not support designated initializers 2008-08-15 13:00 it's really c--++ 2008-08-15 13:04 doesn't build 2008-08-15 13:04 dleaf.c:425: error: 'balloc' undeclared here (not in a function) 2008-08-15 13:05 need to include balloc.c or something 2008-08-15 13:06 happens in all the files actually 2008-08-15 13:07 hmm or forward declare it tux3.h ? and link to a balloc.o ? 2008-08-15 13:08 flips: ^ fail 2008-08-15 13:17 shapor, will fix 2008-08-15 13:32 shapor, that should do it 2008-08-15 13:35 it is stupid that you can't go back and fix commit comments in hg 2008-08-15 13:35 that is a problem with the git + hg + monotone breed 2008-08-15 13:36 also no real renames 2008-08-15 13:36 there remains room for another step forward in version control 2008-08-15 13:44 -!- pgquiles(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 13:50 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 14:10 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-15 14:25 -!- pgquiles_(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 14:44 -!- pgquiles__(~pgquiles@d54C5B8AA.access.telenet.be) has joined #tux3 2008-08-15 18:03 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-15 18:51 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-15 22:44 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-16 02:39 -!- Kirantpatil(~kiran@122.167.212.233) has joined #tux3 2008-08-16 02:39 -!- Kirantpatil(~kiran@122.167.212.233) has left #tux3 2008-08-16 11:14 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-16 11:14 -!- ChanServ changed mode/#tux3 -> -o flips 2008-08-16 14:42 -!- pgquiles(~pgquiles@132.Red-217-125-199.dynamicIP.rima-tde.net) has joined #tux3 2008-08-16 17:06 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-08-16 17:06 doesn't tux3 suck as a web server compared to apache2 ? 2008-08-16 17:17 bh: this channel isn't about tux3 the web server 2008-08-16 17:18 -!- ChanServ changed mode/#tux3 -> +o shapor 2008-08-16 17:49 bh -> bill huery 2008-08-16 17:49 huey 2008-08-16 17:50 ah ;) 2008-08-16 17:50 nice troll bh 2008-08-16 17:51 ACTION gets on a little more caffein 2008-08-16 17:53 smelled like a troll, thats why i op'ed myself ;) 2008-08-16 17:53 didn't make the connection 2008-08-16 17:58 :-) 2008-08-16 17:58 bh is like that, only made one of him 2008-08-16 17:58 I am thinking... probably crazy thought 2008-08-16 17:58 but it's getting not far away from kernel port time 2008-08-16 17:59 I wonder if some of the porting could be automated with a perl script 2008-08-16 17:59 haha 2008-08-16 17:59 see crazy above 2008-08-16 17:59 perhaps 2008-08-16 17:59 things like changing printf to printk 2008-08-16 17:59 instead of #define printf printk 2008-08-16 18:27 a couple of big differences between the user space code and kernel: 1) kackers don't like c99 inline decls 2) bit fields... are they consistently implemented across arches? likely not 2008-08-16 18:28 could possibly isolate those differences behind inline access functions buried in header files 2008-08-16 22:06 woohoo, bitmap flush worked on the first try after plugging in the btree root 2008-08-16 22:08 merging the changes in with my format string fix went smoothly 2008-08-16 22:08 hg++ 2008-08-16 22:37 indeed 2008-08-16 22:38 could you try it with the (unsigned int) casts removed 2008-08-16 22:38 I don't think there should be warnings but I'm prepared to be re-educated 2008-08-16 22:38 oh your merge 2008-08-16 22:38 actually I haven't tried merge in hg yet 2008-08-16 22:39 I'm glad to hear it's pleasant 2008-08-16 22:40 ok, just need to do free block and bitmap allocation is finished right up to the first kernel drop 2008-08-16 22:44 dleaf.c:133: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'long int' 2008-08-16 22:45 just needs to be (int) to make it happy i suppose though 2008-08-16 22:45 since we're doing pointer math we expect to be a small value 2008-08-16 23:22 I think 2008-08-16 23:23 (int) works 2008-08-16 23:23 although casting to int specifically for %u seems silly 2008-08-16 23:23 really 2008-08-16 23:23 it is odd that pointer difference returns long in instead of int 2008-08-16 23:23 (unsigned int) makes more sense to me 2008-08-16 23:23 only on 64 bit 2008-08-16 23:24 we can also do the cast on the 32 bit side 2008-08-16 23:24 and use %lu as the format string? 2008-08-16 23:25 best way to do it is a silly thing to worry about since all the printfs are going away 2008-08-16 23:25 yes 2008-08-16 23:26 nice to keep some of the tracing around and expect it to work 2008-08-16 23:26 well we know it will work 2008-08-16 23:26 but i suspect printk is somewhat different anyway? 2008-08-16 23:28 very similar 2008-08-16 23:28 been honed by unix wookies for years 2008-08-16 23:32 ok, put back pretty much the way you had it 2008-08-16 23:32 yuck huh? 2008-08-16 23:33 ACTION done for today 2008-08-16 23:34 hm 2008-08-16 23:34 have you thought about the user interface to snapshot data 2008-08-16 23:35 say i want to copy a directory and all the previous version of it 2008-08-16 23:35 snapshot data or snapshot? 2008-08-16 23:35 to say, another tux3 filesystem 2008-08-16 23:35 all previous versions, no 2008-08-16 23:35 just a specified version 2008-08-16 23:35 copy with history is an interesting challenge 2008-08-16 23:36 could be useful for quite a few things 2008-08-16 23:36 sounds hard 2008-08-16 23:36 i know, thats why i asked 2008-08-16 23:36 but if the use case is compelling... 2008-08-16 23:37 copy history does sounds useful 2008-08-16 23:37 very 2008-08-16 23:37 say you take daily snapshots 2008-08-16 23:37 but you want to do a weekly backup to tape 2008-08-16 23:37 which includes all those incremental daily changes 2008-08-16 23:37 or even hourly 2008-08-16 23:38 easy, back up a bunch of deltas 2008-08-16 23:38 can you get a delta for a particular directory? 2008-08-16 23:38 that is the plan 2008-08-16 23:38 of a given file 2008-08-16 23:38 what is the interface? 2008-08-16 23:38 some ddlink thing 2008-08-16 23:38 with some c program driving the ddlink 2008-08-16 23:39 in other words, go crazy 2008-08-16 23:39 yeah 2008-08-16 23:39 but that wasn't the usecase i originally thought of 2008-08-16 23:39 i was trying to simplify it 2008-08-16 23:39 probably best to think about the primitive ops and think how to link them up to do complex things 2008-08-16 23:40 thinking about an organized delta store 2008-08-16 23:40 that you can not only write to but read from... and do what with? 2008-08-16 23:40 something like the guy did with zumastor maybe 2008-08-16 23:41 mountable deltas 2008-08-16 23:41 that would rule the universe 2008-08-16 23:41 but hard, probably 2008-08-16 23:41 yeah migrate them to nearline storage 2008-08-16 23:41 i bet you think doing it with ddsnap delta files was hard too though 2008-08-16 23:41 and someone did it ;) 2008-08-16 23:42 speaking of whom, you should loop him in about tux3 2008-08-16 23:44 yes 2008-08-16 23:44 remember the subject line? 2008-08-16 23:45 ddloop perhaps? 2008-08-16 23:45 I thought that was my name 2008-08-16 23:46 "Backups using ddsnap 2008-08-16 23:46 http://www.nomorevoid.com/downloads/dm-ddloop.tar.gz 2008-08-16 23:47 " 2008-08-16 23:47 right on our home page 2008-08-16 23:47 was the subject on the list 2008-08-16 23:47 yeah 2008-08-16 23:47 got it 2008-08-16 23:48 why dont google groups archives appear in google search results? 2008-08-16 23:48 because goog suxorx? 2008-08-16 23:48 http://groups.google.com/group/zumastor/browse_thread/thread/c95970acdc2e31ca/cc9931d18043f31f 2008-08-16 23:49 http://www.google.com/search?q=%22backups+using+ddsnap%22 2008-08-16 23:49 doesn't find it unfortunately 2008-08-16 23:57 googlegroups is really too lame to have a mailing list on 2008-08-16 23:57 it would be fine is gmane was subscribed 2008-08-16 23:57 I'll set up a new zumastor mailing list 2008-08-16 23:59 t A following integer conversion corresponds to a ptrdiff_t argument -- printf man page 2008-08-17 00:00 now lets see if printk has it 2008-08-17 00:00 set it up where? 2008-08-17 00:00 @tux3.org? 2008-08-17 00:01 if you move it you'd have re-subscribe all the members 2008-08-17 00:01 sounds like a pita 2008-08-17 00:03 true 2008-08-17 00:03 printk has %t 2008-08-17 00:03 so maybe we should use it on the use every feature principle 2008-08-17 00:04 whats %t? 2008-08-17 00:05 difference of pointers :p 2008-08-17 00:05 does printf? 2008-08-17 00:06 also 2008-08-17 00:09 http://lxr.linux.no/linux+v2.6.26.2/lib/vsprintf.c#L779 2008-08-17 00:09 -!- flips(~phillips@phunq.net) has left #tux3 2008-08-17 00:09 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-17 00:10 http://lxr.linux.no/linux+v2.6.26.2/lib/vsprintf.c#L779 <- the documentation for printk 2008-08-17 00:11 ah i see 2008-08-17 00:11 yes we should be using that 2008-08-17 00:11 perfect 2008-08-17 00:13 finally 2008-08-17 00:13 i've always wanted to read the printf man page to see all the features i never use 2008-08-17 00:13 this is a good reason to do so 2008-08-17 00:14 uhoh 2008-08-17 00:15 ? 2008-08-17 00:16 nice no compile warnings on 64 2008-08-17 00:17 %ti does make more sense than %tu 2008-08-17 00:17 in case of an error it will be easy to see the negative value then 2008-08-17 00:18 right 2008-08-17 03:06 -!- pgquiles(~pgquiles@132.Red-217-125-199.dynamicIP.rima-tde.net) has joined #tux3 2008-08-17 06:38 morning pgquiles 2008-08-17 06:39 hey flips 2008-08-17 06:40 a little early for here 2008-08-17 06:44 what time is it over there? 2008-08-17 06:44 6:45 2008-08-17 06:44 that hurts :-) 2008-08-17 06:45 yup 2008-08-17 06:45 gotta get some more zzz's 2008-08-17 06:45 see all the tux3 checkins 2008-08-17 06:45 shapor got another patch, 3rd or 4th I think 2008-08-17 06:46 did a Jos van den Oever write you? 2008-08-17 06:46 he's the guy developing Strigi 2008-08-17 06:52 not yet 2008-08-17 06:53 he'll eventually do 2008-08-17 06:54 getting indexing working well for kde would please me 2008-08-17 06:55 me too :-) 2008-08-17 06:55 anyway, strigi is not tied to kde 2008-08-17 06:56 even better 2008-08-17 06:56 but kde will use it early I suppose 2008-08-17 06:56 yes, it is already using it 2008-08-17 06:57 in 4.0 and 4.1 you need to explicitly enable it but in 4.2 will be enabled by default 2008-08-17 06:57 ACTION loves kde 2008-08-17 06:57 flips: there's Camp KDE in January in Jamaica, you know :-) 2008-08-17 06:57 wow, even more south than I already am 2008-08-17 06:58 I will keep it in mind 2008-08-17 06:59 I'm going to sleep one hour, I still need to recover from aKademy 2008-08-17 06:59 see you later 2008-08-17 06:59 bye 2008-08-17 11:56 -!- pgquiles_(~pgquiles@126.Red-80-39-172.dynamicIP.rima-tde.net) has joined #tux3 2008-08-17 13:42 ACTION goes wacks tree_expand into 2 big pieces 2008-08-17 13:43 typos galor 2008-08-17 13:54 whacking completed 2008-08-17 13:54 now to make it make sense 2008-08-17 13:58 7 parameters for insert_child now down to 6... 2008-08-17 15:14 iattr.c coming soon 2008-08-17 15:14 you think perhaps a dir should contain a default file attr entry? 2008-08-17 15:15 yes for sure 2008-08-17 15:15 in most cases dirs contain very similar files 2008-08-17 15:15 and be inherited instead of storing the same attr in each inode 2008-08-17 15:15 exactly 2008-08-17 15:15 inode attrs come in groups 2008-08-17 15:15 one of the groups is ctime/mode/uid/guid 2008-08-17 15:15 the "create" group 2008-08-17 15:16 that one can usually be inherited except for ctime 2008-08-17 15:16 well 2008-08-17 15:16 so, break out the ctime 2008-08-17 15:16 so there is maybe two flavors of the create group 2008-08-17 15:16 ctime only and ctime plus ownership 2008-08-17 15:17 it might be ok to always separate them 2008-08-17 15:17 a version and a 4 bit attr type field have to go in each attr 2008-08-17 15:18 so: struct ctime { u64 kind:4, version:10, time: 50 }; 2008-08-17 15:20 and struct owner { u64 kind:4, version:10, mode: 50, uid:32, gid:32 }; 2008-08-17 15:20 8 bytes for the first, 16 bytes for the second 2008-08-17 15:21 16 bytes saved every time it is possible to inherit the owner from the directory 2008-08-17 15:21 which is something like 99.9% of the time on my system I think 2008-08-17 15:25 ACTION just wrote a sick one-liner to determine the average number of different file perms a directory contains 2008-08-17 15:26 here is the distribution from my web server: 2008-08-17 15:26 651 0 2008-08-17 15:26 4318 1 2008-08-17 15:26 92 2 2008-08-17 15:26 8 3 2008-08-17 15:26 4 4 2008-08-17 15:27 so, 651 directories with no files in them, 4318 with only one set of perms on the files contained in it 2008-08-17 15:27 92 with 2, etc 2008-08-17 15:27 aka, output of: 2008-08-17 15:27 find / -xdev -type d -exec sh -c 'ls -l $1 | awk "/^\-/ {print \$1}" | sort -u |wc -l' {} {} \; | sort | uniq -c 2008-08-17 15:38 across the entire filesystem there are only 30 unique permissions settings 2008-08-17 15:38 only about 500k files 2008-08-17 15:38 nice observation 2008-08-17 15:38 s/only/on/ 2008-08-17 15:38 so there should be owner atoms? 2008-08-17 15:38 or sorry 2008-08-17 15:38 permission atoms 2008-08-17 15:40 above, 4318 out of 6000 files save 16 bytes 2008-08-17 15:41 running on my system 2008-08-17 15:41 those are directories 2008-08-17 15:41 not files 2008-08-17 15:41 right 2008-08-17 15:41 directories have attrs just like files, I'm getting a sense of overall saving 2008-08-17 15:42 ah 2008-08-17 15:42 considering just files, what is the single permissions percentage? 2008-08-17 15:43 you mean, how many files live in directories with only one set of permissions? 2008-08-17 15:43 this is also a fun one to run: 2008-08-17 15:43 find / -xdev -printf "%m\n" 2>/dev/null| sort | uniq -c| sort -n 2008-08-17 15:44 out of 53972 files, only 24 unique permissions sets 2008-08-17 15:44 41605 files are mode 644 2008-08-17 15:46 talking] 2008-08-17 15:49 39537 files live in directories with only one set of permissions 2008-08-17 15:52 its total brain damage to store 0644 all over the filesystem 2008-08-17 15:58 ACTION keeps getting more excited about tux3 2008-08-17 16:01 flips: where did "16 bytes" come from? 2008-08-17 16:02 each set of permissions is only 12 bits 2008-08-17 16:02 struct owner { u64 kind:4, version:10, mode: 50, uid:32, gid:32 }; 2008-08-17 16:02 they come in groups 2008-08-17 16:02 oh 2008-08-17 16:02 i see 2008-08-17 16:02 they have to be discriminated for traversal and versioned 2008-08-17 16:02 that is why putting them in groups is a win 2008-08-17 16:03 mode:50 ? 2008-08-17 16:03 some extra ;-) 2008-08-17 16:03 at this point I am trying to preserve 8 byte granularity 2008-08-17 16:03 I am not sure if that matters 2008-08-17 16:03 i see 2008-08-17 16:03 I can go to byte aligned and save more space 2008-08-17 16:03 probably the right thing to do actually 2008-08-17 16:04 but it is a detail of iattr.c 2008-08-17 16:04 thats a lot of extra mode bits 2008-08-17 16:04 why? 2008-08-17 16:04 struct owner { u64 kind:4, version:10, mode: 16, pad: 36, uid:32, gid:32 }; 2008-08-17 16:04 ok? 2008-08-17 16:05 oh i guess we should support "chattr" 2008-08-17 16:05 struct owner { u64 kind:4, version:10, mode:16, pad:4, uid:32, gid:32 }; <- 12 bytes] 2008-08-17 16:05 um 2008-08-17 16:05 I'm on drugs 2008-08-17 16:06 16 bytes 2008-08-17 16:06 so you were right, :50 was just silly 2008-08-17 16:06 chattr must be supported of course 2008-08-17 16:06 how many bits does chattr need 2008-08-17 16:06 chattr the command? 2008-08-17 16:07 yeah 2008-08-17 16:07 ioctl(3, EXT2_IOC_GETFLAGS, 0x7fff16bffeec) = 0 2008-08-17 16:08 struct owner { u64 kind:4, version:10, mode:20, uid:32, gid:32 }; <- 12 bytes] 2008-08-17 16:08 20 mode bits should do it 2008-08-17 16:08 we need to support that interface 2008-08-17 16:09 how does that work with the vfs layer 2008-08-17 16:09 http://lxr.linux.no/linux+v2.6.26.2/include/linux/ext2_fs.h#L200 2008-08-17 16:09 http://lxr.linux.no/linux+v2.6.26.2/+ident=11795060 2008-08-17 16:11 ext2 has 22 of its own per file flags, mostly bullshit 2008-08-17 16:11 unused 2008-08-17 16:11 things like compression 2008-08-17 16:12 tail packing 2008-08-17 16:12 #define EXT2_INDEX_FL FS_INDEX_FL /* hash-indexed directory */ <- one of the few that actually got implemented (by me) 2008-08-17 16:12 heh 2008-08-17 16:13 anyway, we will divide them into commonly used flags that go in the 20 mode bits and rare ones that go in some other attribute type 2008-08-17 16:14 16 basic attribute types should be enough 2008-08-17 16:14 one of those types is "extended attribute" 2008-08-17 16:14 ok 2008-08-17 16:14 well there are only 8 spare mode bits 2008-08-17 16:15 not counting chattr 2008-08-17 16:15 i was just trying to figure out how many chattr needed 2008-08-17 16:15 ACTION does man chattr 2008-08-17 16:15 not very many I think 2008-08-17 16:16 ----------------- /tmp 2008-08-17 16:16 thats a lot of dashes (output from lsattr) 2008-08-17 16:17 98% of your files could inherit struct owner from parent directory 2008-08-17 16:17 just crunched it 2008-08-17 16:17 my system is: 2008-08-17 16:17 4113 0 2008-08-17 16:17 16326 1 2008-08-17 16:17 419 2 2008-08-17 16:17 20 3 2008-08-17 16:17 8 4 2008-08-17 16:17 3 5 2008-08-17 16:17 1 7 2008-08-17 16:18 yeah 2008-08-17 16:18 my system is 97.3 2008-08-17 16:19 % 2008-08-17 16:19 probably rarely under 90% 2008-08-17 16:19 you are 97.7 actually 2008-08-17 16:19 even on multiuser systems 2008-08-17 16:19 you are 97.6 actually 2008-08-17 16:19 true 2008-08-17 16:19 its rare to work in the same directory as another user 2008-08-17 16:20 that is 261K of kernel cache saved for full stat 2008-08-17 16:20 on my system 2008-08-17 16:20 which is not particularly big fs 2008-08-17 16:21 man ls -lR is fucking slow 2008-08-17 16:21 ddtree is fscking fast ;-) 2008-08-17 16:22 it stats /etc/localtime for EVERY file ?? 2008-08-17 16:22 bleah 2008-08-17 16:22 work of the fsf unfortunately 2008-08-17 16:27 http://lxr.linux.no/linux+v2.6.26.2/fs/ext2/inode.c#L1147 <- ext2_set_inode_flags 2008-08-17 16:28 total of 10 flags 2008-08-17 16:28 per inode 2008-08-17 16:28 almost all seem sensible 2008-08-17 16:31 struct owner { u64 kind:4, version:10, mode:18, uid:32, gid:32 }; <- 12 bytes 2008-08-17 16:31 too tight maybe 2008-08-17 16:32 unless mode is an atom 2008-08-17 16:32 anyway bask to inums 2008-08-17 20:38 -!- tux3bot(~tux3bot@yzf.shapor.com) has joined #tux3 2008-08-17 20:41 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-08-17 21:38 tuxkey_t *next_key(struct treepath *path, int levels) 2008-08-17 21:38 { 2008-08-17 21:38 for (int level = levels; --level < 0; ) 2008-08-17 21:38 if (!finished_level(path, level)) 2008-08-17 21:38 return &path[level].next->key; 2008-08-17 21:38 return NULL; 2008-08-17 21:38 } 2008-08-17 21:39 shapor, how does it look/ 2008-08-17 21:40 ok? 2008-08-17 21:42 flips: that doesn't seem very complex 2008-08-17 21:43 just looks up the path until it finds a level where we haven't read all the way to the end of the index block yet 2008-08-17 21:44 there we find a key that separates the subtree we are in (a leaf) from the next subtree to the right 2008-08-17 21:45 yeah i was able to read the c, you dont need to put that in a comment ;) 2008-08-17 21:45 just wondering why you were pasting it 2008-08-17 21:45 aw, I was just doing that ;-) 2008-08-17 21:45 because I haven't tested it 2008-08-17 21:45 heh 2008-08-17 21:45 it's part of the inode table block insertion stuff 2008-08-17 21:46 oh 2008-08-17 21:46 i dont know how that all fits together 2008-08-17 21:46 knowing the successor key tells us whether we should advance to the next block to search it or insert a new one, becauase the successor block is too far away in key space 2008-08-17 21:46 i think you described it in a post recently 2008-08-17 21:46 or no 2008-08-17 21:48 not in that detail 2008-08-17 21:48 and not everybody will be able to see the purpose from the code, so I put in the comment 2008-08-17 21:53 this function the way it is written can actually replace a nasty little bit of code in the delete, because it returns a pointer 2008-08-17 21:53 instead of the key 2008-08-17 21:53 in the delete, we have to find that successor key and change it 2008-08-17 21:57 i was thinking about the size and mtime 2008-08-17 21:57 do we have to update those? 2008-08-17 21:57 yes 2008-08-17 21:57 every time the file changes 2008-08-17 21:57 with a little bit of slop 2008-08-17 21:57 definitely on every fsync 2008-08-17 21:58 this is a job for the log of course 2008-08-17 21:58 tuxkey_t *p = next_key(path, levels), next = p ? *p : MAX_INODES; <- obscure enough? 2008-08-17 21:59 no i like it 2008-08-17 21:59 makes perfect sense 2008-08-17 22:00 traditionalists have trouble with C code like that 2008-08-17 22:00 then they need to be reminded that C99 was last millenium already 2008-08-17 22:00 doesn't work though 2008-08-17 22:01 a lot of folks are still scared of structure assignment 2008-08-17 22:01 let along inline decls 2008-08-17 22:01 let alone 2008-08-17 22:01 what doesnt work 2008-08-17 22:01 it doesn't work to remind them 2008-08-17 22:01 oh 2008-08-17 22:01 traditionalists are like that 2008-08-17 22:02 it is convenient that the maximum inodes and maximum blocks of a filesystem are the same 2008-08-17 22:02 that means there is a density of one inode per block 2008-08-17 22:02 howso 2008-08-17 22:03 which is convenient for deciding which inum to use 2008-08-17 22:03 over time with deletes that nice simple relationship deteriorates 2008-08-17 22:03 then we need other mojo 2008-08-17 23:04 flips: why are some of the data structures in the c files and some in tux3.h? 2008-08-17 23:13 because me being lame 2008-08-17 23:13 the c files are included 2008-08-17 23:13 like header files 2008-08-17 23:13 this is only for while it's in heavy hack mode 2008-08-17 23:14 when everything has to be declared twice it gets in the way of quick refactoring 2008-08-17 23:21 so struct inode is just from ext2 i guess? 2008-08-17 23:21 vfs 2008-08-17 23:22 with version added ? 2008-08-17 23:22 generic inode has version, yes, it is a different, bogus version 2008-08-17 23:22 stupid misnamed field 2008-08-17 23:22 i see 2008-08-17 23:23 it is really a change counter 2008-08-17 23:23 ah 2008-08-17 23:23 so that ext2/3 knows when it has to go look in the directory in case the entry it was positioned on was deleted 2008-08-17 23:23 it is that specific 2008-08-17 23:23 calling that a version was a heinous crime 2008-08-17 23:28 i'll just ignore it ;) 2008-08-17 23:29 it gets used in dir.c for revalidate, nowhere else 2008-08-17 23:36 causes running inode.c to fail 2008-08-17 23:36 what does? 2008-08-17 23:39 er i thought i pasted the assert 2008-08-17 23:40 whoops 2008-08-17 23:40 [10869] probe: Failed assertion "(ops->leaf_sniff)(sb, buffer->data)" 2008-08-17 23:41 that's fresh out of repository? 2008-08-17 23:42 maybe you have to give a filename on the command line? 2008-08-17 23:42 ah 2008-08-17 23:42 not a good error message to be sure 2008-08-17 23:42 yeah, my bad 2008-08-17 23:43 woo, it really tries hard even when it has no backing file 2008-08-17 23:44 I wonder if this is good 2008-08-17 23:45 generated 32 bitmap blocks, that is not good behaviou 2008-08-17 23:45 behavior 2008-08-17 23:46 [18006] main: Bitmap flush failed (Bad file descriptor) <- this is nice 2008-08-17 23:47 fd '(null)' = -1 (0xffffffff bytes) <- this maybe not so nice 2008-08-17 23:49 sb->image.blocks has a wierd number when fd is -1 2008-08-17 23:51 it's strange that printf doesn't support signed hexadecimal 2008-08-17 23:54 ok, fdsize64 now returns zero when it fails, better than overloading the size with -1, or maybe not 2008-08-17 23:54 anyway, inode.c then proceeds to fail on the leaf 2008-08-17 23:54 I wonder if those failure paths are decent 2008-08-17 23:55 probe needs to fail and not think it got a leaf 2008-08-17 23:55 hrm signed hex 2008-08-17 23:56 makes sense to me 2008-08-17 23:56 i suppose 2008-08-17 23:57 why doesn't fdsize64 just take a u64* 2008-08-17 23:57 it should 2008-08-17 23:57 and return err 2008-08-17 23:57 I thought I made it do that 2008-08-17 23:57 but the patch probably got danked 2008-08-17 23:57 like the ioctl does 2008-08-18 00:01 i thought i remember seeing that too 2008-08-18 00:13 [18233] main: fdsize64 failed for '(null)' (Bad file descriptor) 2008-08-18 00:14 better 2008-08-18 00:14 I'm still interested in the behavior when it tries to keep running anyway 2008-08-18 00:14 I'll just change the error() to warn() 2008-08-18 00:16 int fdsize64(int fd, uint64_t *size) 2008-08-18 00:16 { 2008-08-18 00:16 struct stat stat; 2008-08-18 00:16 if (fstat(fd, &stat)) 2008-08-18 00:16 return -errno; 2008-08-18 00:16 if (S_ISREG(stat.st_mode)) { 2008-08-18 00:16 *size = stat.st_size; 2008-08-18 00:16 return 0; 2008-08-18 00:16 } 2008-08-18 00:16 return ioctl(fd, BLKGETSIZE64, size) ? -errno : 0; 2008-08-18 00:16 } 2008-08-18 00:16 maybe it should just return -1 like other libc stuff 2008-08-18 00:16 yes it should 2008-08-18 00:17 int fdsize64(int fd, uint64_t *size) 2008-08-18 00:17 { 2008-08-18 00:17 struct stat stat; 2008-08-18 00:17 if (fstat(fd, &stat)) 2008-08-18 00:17 return -1; 2008-08-18 00:18 if (S_ISREG(stat.st_mode)) { 2008-08-18 00:18 *size = stat.st_size; 2008-08-18 00:18 return 0; 2008-08-18 00:18 } 2008-08-18 00:18 return ioctl(fd, BLKGETSIZE64, size); 2008-08-18 00:18 } 2008-08-18 00:29 lgtm 2008-08-18 00:30 flips: did you see my patch on the list 2008-08-18 00:30 not yet 2008-08-18 00:30 makefile update + more 64 bit wanrings 2008-08-18 00:31 ah 2008-08-18 00:33 needs a little merge lovin 2008-08-18 00:34 it wouldn't if you committed more often ;) 2008-08-18 00:34 its been 8 hrs.. cmon, yer slackin 2008-08-18 00:35 you erred 2008-08-18 00:35 it's been 2 minutes 2008-08-18 00:35 ah my cron job hasn't run 2008-08-18 00:35 damn it 2008-08-18 00:35 thats it, every minute 2008-08-18 00:37 um 2008-08-18 00:38 please, I am monitoring the http accesses 2008-08-18 00:38 you will dominate 2008-08-18 00:38 you now have 8, 1 minute apart 2008-08-18 00:39 there has to be an event driven way to do this 2008-08-18 00:39 I thought that was what rss was about 2008-08-18 00:41 if your server can't take 1 pull per minute from me, that is sad 2008-08-18 00:41 rss is bs 2008-08-18 00:41 its just an xml formated page 2008-08-18 00:41 it can 2008-08-18 00:41 ajax is supposedly the browser "push" technology 2008-08-18 00:41 it's my eyes when I scan the log 2008-08-18 00:42 I see all these pulls 2008-08-18 00:42 but its really just javascript'ed pulls 2008-08-18 00:42 of more xml 2008-08-18 00:42 lame 2008-08-18 00:42 stupid heavy format 2008-08-18 00:42 yep, polling is the way of the web 2008-08-18 00:43 hairy footed hippies on the steering committee methinks 2008-08-18 00:45 there, shapor.com seems to have mellowed a little 2008-08-18 00:45 how could I notify your pull if I wanted to, email? 2008-08-18 00:45 do some magic http access to your server? 2008-08-18 00:46 i could just provide a repo you could push to 2008-08-18 00:46 why bother with the overhead of requesting a pull ? 2008-08-18 00:46 then you poll it ;-) 2008-08-18 00:46 sure 2008-08-18 00:46 we should do that 2008-08-18 00:46 well yeah, poll on the local box 2008-08-18 00:47 I want auto-push 2008-08-18 00:47 does hg do it? 2008-08-18 00:47 doubtful 2008-08-18 00:47 hrm perhaps 2008-08-18 00:47 well you could poll locally too :P 2008-08-18 00:47 what I was thinking and making funny faces 2008-08-18 00:48 and I am supposed to fix inotify ;-) 2008-08-18 00:48 heh 2008-08-18 00:48 oh right.. for the kde guys? 2008-08-18 00:48 it works well enough 2008-08-18 00:48 right 2008-08-18 00:48 for this purpose 2008-08-18 00:48 they say it doesn't 2008-08-18 00:48 perhaps for this purpose 2008-08-18 00:49 hrm an hg watcher that pushes upstream on local commits would be useful 2008-08-18 00:49 could support git also 2008-08-18 00:49 let's get our whine in 2008-08-18 00:49 why whine? 2008-08-18 00:50 you'll just code it I know 2008-08-18 00:50 no whining 2008-08-18 00:50 it would be good 2008-08-18 00:50 some way you just say source/target like zumastor 2008-08-18 00:50 and have replicated nets of source code 2008-08-18 00:50 yeah certainly needs thought/first stab at it before whining 2008-08-18 00:51 bitbucket/github guys would be interested probably 2008-08-18 00:51 well, starting to get late 2008-08-18 00:51 i wonder if they have any programs to do it already 2008-08-18 00:51 not even 1 yet 2008-08-18 00:51 probably matt is no dummy 2008-08-18 00:52 I was just in the middle of setting up some btree unit testing 2008-08-18 00:52 got tired of testing btree in the live app 2008-08-18 00:52 things like advance should be tested in isolation 2008-08-18 00:52 and lots of other things 2008-08-18 00:52 if btrees are ever expected to be solid 2008-08-18 00:58 ok i cleaned up the 64 bit warning on the most recent code ;) 2008-08-18 00:58 patch? 2008-08-18 00:58 mailed 2008-08-18 00:58 this is already getting old 2008-08-18 00:58 right, I should pull from you 2008-08-18 00:59 i need to learn how to make a tree public with hg 2008-08-18 00:59 tomorrow 2008-08-18 01:00 I'll just look in my httpd.conf when you're ready 2008-08-18 01:01 basically nothing to do 2008-08-18 01:01 just give access to it 2008-08-18 01:02 if I can see the repo directory I can pull 2008-08-18 01:02 and unlike git this is efficient 2008-08-18 01:03 ah cool 2008-08-18 01:04 that is easy for me to do, just create a symlink 2008-08-18 01:04 yes 2008-08-18 01:04 the .hg dir? 2008-08-18 01:04 or the root of my checkout? 2008-08-18 01:04 the root I think 2008-08-18 01:04 let me check 2008-08-18 01:05 the root of your repo, not the .hg 2008-08-18 01:05 http://shapor.com/tux3/hg/ 2008-08-18 01:06 we shoulda used your last patch to test 2008-08-18 01:06 i canlt clone from it though 2008-08-18 01:06 are you sure its that simple? 2008-08-18 01:06 can't* 2008-08-18 01:06 yes 2008-08-18 01:07 you should be able to clone 2008-08-18 01:07 abort: 'http://shapor.com/tux3/hg/' does not appear to be an hg repository! 2008-08-18 01:07 does it have a .hg? 2008-08-18 01:07 yes 2008-08-18 01:07 http://shapor.com/tux3/hg/.hg/ 2008-08-18 01:07 just a sec 2008-08-18 01:09 i dont think its as simple as you say it is 2008-08-18 01:10 ah, you can clone like that 2008-08-18 01:10 ah 2008-08-18 01:10 but not pull 2008-08-18 01:10 yeah 2008-08-18 01:11 er are you sure? 2008-08-18 01:11 i think you can 2008-08-18 01:11 ah 2008-08-18 01:11 static-http: 2008-08-18 01:11 you can pull after you clone ;-) 2008-08-18 01:11 no you can't 2008-08-18 01:11 http://www.selenic.com/mercurial/wiki/index.cgi/StaticHTTP 2008-08-18 01:11 clone is happy to clone, but pull says no repo, bad message 2008-08-18 01:12 are you using static-http ? 2008-08-18 01:12 static wha? 2008-08-18 01:12 hg pull 2008-08-18 01:12 pulling from static-http://shapor.com/tux3/hg 2008-08-18 01:13 you can't just use the http: prefix 2008-08-18 01:13 that expect a mercurial cgi script on the other end 2008-08-18 01:13 static-http: prefix expects just the regular old files 2008-08-18 01:13 on the other end 2008-08-18 01:13 see the link i sent above 2008-08-18 01:13 ok, here goes 2008-08-18 01:14 hg pull static-http://shapor.com/tux3/hg 2008-08-18 01:14 abort: no repo found! 2008-08-18 01:16 you have to be in an hg tree 2008-08-18 01:16 hg clone static-http://shapor.com/tux3/hg <- works 2008-08-18 01:16 run that pull command inside your tux3 dir 2008-08-18 01:16 yes that works 2008-08-18 01:16 now... it shows as directory hg 2008-08-18 01:17 that is not too good style 2008-08-18 01:17 doh 2008-08-18 01:17 would be better named shapor 2008-08-18 01:17 well tux3 2008-08-18 01:17 better 2008-08-18 01:17 tux3-shapor? 2008-08-18 01:17 sure 2008-08-18 01:18 you can call it whatever you like though 2008-08-18 01:18 http pull this way is efficient enough 2008-08-18 01:18 i think 2008-08-18 01:18 way better than email 2008-08-18 01:18 hell yeah 2008-08-18 01:18 calling it hg is lame ;-) 2008-08-18 01:19 makes sense tux3/hg 2008-08-18 01:19 its the mercurial repo for tux3 ;) 2008-08-18 01:19 renamed to shapor-tux3 2008-08-18 01:21 ok i commited a change 2008-08-18 01:21 pull from me 2008-08-18 01:22 dinner time here 2008-08-18 01:22 ok 2008-08-18 01:22 hg pull static-http://shapor.com/tux3-shapor 2008-08-18 01:22 abort: HTTP Error 403: Forbidden 2008-08-18 01:23 hg pull static-http://shapor.com/tux3/shapor-tux3 2008-08-18 01:25 worked 2008-08-18 01:25 now I need to see the diff 2008-08-18 01:26 oh cool, hg has support for all kinds of hooks 2008-08-18 01:26 look in man hgrc 2008-08-18 01:26 under "hooks" 2008-08-18 01:27 yes, what an excellent use of time 2008-08-18 01:27 ok, I need to pay attention the the family now 2008-08-18 01:27 catch ya tomorrow 2008-08-18 02:50 -!- pgquiles(~pgquiles@172.Red-83-38-37.dynamicIP.rima-tde.net) has joined #tux3 2008-08-18 10:11 -!- pgquiles(~pgquiles@251.Red-81-37-107.dynamicIP.rima-tde.net) has joined #tux3 2008-08-18 12:28 So... is tux3 going to really be a "ZFS killer?" 2008-08-18 14:23 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-18 14:28 boom, zfs is self killing imho 2008-08-18 14:28 "zfs is a rampant layer violation" -- akpm 2008-08-18 14:28 tux3 + lvm3 will cover the checkbox items of zfs 2008-08-18 14:29 not that I am wildly excited about the idea of checksumming metadata, but tux3 will eventually have that too, as an option 2008-08-18 14:30 tux3 will have the immense advantage of running on Linux 2008-08-18 14:30 as long as Sun keeps being idiotic about the zfs license, zfs will not 2008-08-18 15:03 flips: The licensing was something that had me sort of confused. Is Linux's policy to never include non-GPL'd code in the kernel itself? 2008-08-18 15:13 more specific than that: the code has to be GPL v2 2008-08-18 15:13 v3 will no do 2008-08-18 15:13 Really..? 2008-08-18 15:13 Whose decision was that, Linus'? 2008-08-18 15:13 really 2008-08-18 15:14 Linus says he can't change is mind because every copyright holder would have to agree 2008-08-18 15:14 there are hundreds, some of them even died 2008-08-18 15:14 Okay, so even still, if the code is available (as it is for freebsd) what's preventing someone from just building and loading it as a third party module? 2008-08-18 15:15 you can, but binary modules without proper license are in a legal gray zone 2008-08-18 15:15 the code will certainly not go into mainlin until Sun adds a GPL v2 license 2008-08-18 15:15 Isn't that "gray zone" what's currently being filled by things like nvidia's binary driver? 2008-08-18 15:16 nvidia's driver has caused all kinds of problems 2008-08-18 15:16 being occupied by is a better term than being filled by 2008-08-18 15:16 filled sounds like satisfies 2008-08-18 15:16 That's true 2008-08-18 15:17 And would tux3 be GPL2, then? 2008-08-18 15:17 user space code is gpl v3, kernel code is gpl v2 2008-08-18 15:17 Okay, thanks for the info. 2008-08-18 15:18 I've gotta say, though, I've developed a growing respect for BSD-licensed code 2008-08-18 15:18 say, that reminds me, it is about time to collect my beer from Eben 2008-08-18 15:18 bsd is awesome 2008-08-18 15:18 I'm sure that would open you guys up to all kinds of abuse from the rest of capitalism though 2008-08-18 15:18 what would? 2008-08-18 15:18 oh 2008-08-18 15:18 bsd 2008-08-18 15:18 yes, I am not in this to give code to msoft 2008-08-18 15:19 I can support that. 2008-08-18 15:19 Hah. 2008-08-18 15:20 I wasted six hours today on their asses. 2008-08-18 15:21 Do you guys have a projected timeline? 2008-08-18 15:23 a draft roadmap is on the mailing list, revised one will go up later today 2008-08-18 15:25 Ah, I suppose I should subscribe to that bad boy. 2008-08-18 15:25 ;-) 2008-08-18 15:46 welcome to tux3 2008-08-18 15:51 Many thanks. 2008-08-18 15:51 np 2008-08-18 16:00 g99 -g -Wall buffer.c diskio.c btree.c && ./a.out foodev 2008-08-18 16:00 root at 0 2008-08-18 16:00 leaf at 1 2008-08-18 16:00 btree leaf with 0 entries 2008-08-18 16:00 leaf free = 3c 2008-08-18 16:00 btree unit tests starting to happen 2008-08-18 16:00 little leaves to make bushy trees 2008-08-18 17:25 folks ;0 2008-08-18 17:25 :) 2008-08-18 17:25 ACTION finds that the backlog has been truncated 2008-08-18 17:44 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-08-18 22:48 hey shapor 2008-08-18 22:48 the btree unit test produces buggy results 2008-08-18 22:48 interested? 2008-08-18 22:48 seems I broke it on the port from ddsnap 2008-08-18 22:49 bh, you're coming up for air 2008-08-18 22:49 ? 2008-08-18 23:02 eh ? 2008-08-18 23:02 what's air ? 2008-08-18 23:03 just working late at night as usual 2008-08-18 23:54 flips: ah i see 2008-08-18 23:55 the insert after the split inserts to the wrong one 2008-08-18 23:55 right 2008-08-18 23:55 know why yet? 2008-08-18 23:55 didn't look at the code 2008-08-18 23:55 I was just going in there 2008-08-18 23:56 i'm guessing similar to the fleaf, er whatever its called now, dleaf? 2008-08-18 23:56 it was broken in the same way initially i think 2008-08-18 23:56 of course it works in ddsnap, I broke it :-/ 2008-08-18 23:56 unit tests noticed 2008-08-19 00:04 test for swap was the wrong direction 2008-08-19 00:06 Results 1 - 10 of about 217,000 for tux3 2008-08-19 00:07 time to show some results then 2008-08-19 00:10 ah, there is another bug 2008-08-19 00:10 I also decided to have the leafbuf be the last element in the tree path instead of handled separately 2008-08-19 00:10 that means it is released by brelse_path 2008-08-19 00:11 but tree_expand needs to store it in the path and doesn't 2008-08-19 00:17 hey flips 2008-08-19 00:17 it was nice meeting up with you Friday 2008-08-19 00:17 that was fun 2008-08-19 00:17 when are you up in la, ever? 2008-08-19 00:18 when I'm speeding through in my VW to head to Mountain View 2008-08-19 00:18 I'm going to do it again later on this week for Burning Man 2008-08-19 00:18 well feel free to slow down for a pit stop 2008-08-19 00:19 yeah, now that I know it's that easy to hook up with you folks, yeah 2008-08-19 00:19 maybe do that for a fuel stop or something like that. 2008-08-19 00:19 btw, LA's club scene is a bit weird, haven't figured it out yet 2008-08-19 00:19 ACTION chill's with Goth/Industrial folks 2008-08-19 00:19 haven't been to a club in la 2008-08-19 00:19 daddy thing does that 2008-08-19 00:20 in the SF area that's actually a professional nerd scene 2008-08-19 00:20 I did goth while in berlin 2008-08-19 00:20 my wife really got into it 2008-08-19 00:20 the previous ATM maintainer is a significant DJ in that scene 2008-08-19 00:20 some MIT Media Lab folks, etc... 2008-08-19 00:20 got some nice pics of us gothing 2008-08-19 00:20 oh really, funny 2008-08-19 00:20 you and wli should get together then :) 2008-08-19 00:20 ah wli 2008-08-19 00:20 he's more of that S&M type, I just like the music 2008-08-19 00:20 wow didn't know 2008-08-19 00:21 harald welte is a serious goth 2008-08-19 00:21 oh yeah, big time dude 2008-08-19 00:21 ran into him in a goth club in berlin, we both said what are you doing here 2008-08-19 00:21 disproportionate engineering and science folks are goth 2008-08-19 00:21 haha 2008-08-19 00:21 that's funnny 2008-08-19 00:21 my ex is a material scientist and love stuff like Joy Division and stuff 2008-08-19 00:21 small world 2008-08-19 00:22 yeah, folks in the SF scene know who I am for the most part, but not what I do per se 2008-08-19 00:22 they knew me when I was starting to complain about how irritating NetApp was and stuff ;) 2008-08-19 00:23 bitched me out when NetApp filed a lawsuit against Sun :) 2008-08-19 00:23 funny 2008-08-19 00:25 netapp should just stick to making money 2008-08-19 00:25 and not making people mad at them 2008-08-19 00:26 well, I fault Sun in this battle 2008-08-19 00:26 q: does ERR_PTR work out ok in userspace? 2008-08-19 00:26 they should have layed off 2008-08-19 00:26 I think I'd like to overload some pointer returns with (negative) error numbers 2008-08-19 00:27 sun probably thinks netapp's claim is weak 2008-08-19 00:27 I'd bet with sun on that 2008-08-19 00:27 well, that's for the courts to decide, but it was because Sun's lawyers stopped talking them is why they eventually filed the lawsuit 2008-08-19 00:28 that's publically known 2008-08-19 00:28 ok, I didn't know 2008-08-19 00:28 hard to know what happened with all the he said she said 2008-08-19 00:28 so really, in this industry with how patents are set up, they really had to cross sue Sun. They filed the lawsuit in a way intentionally so that Sun would also have to cross sue them 2008-08-19 00:29 It's in Dave Hitz's blog 2008-08-19 00:29 I can introduce you to those folks the next time you're up if you want 2008-08-19 00:29 hey how about applying your considerable intellect to the question of whether ERR_PTR is ok to use in userspace 2008-08-19 00:29 I know most of those folks well 2008-08-19 00:29 and I'm sure they'd like to talk to you out of curiosity and stuff 2008-08-19 00:29 maybe one day 2008-08-19 00:29 flips: I know nothing about userspace/kernel space boundary stuff, sorry 2008-08-19 00:30 has nothing to do with kernel 2008-08-19 00:30 well, the next time you head to Mountain View I can set something up for you folks 2008-08-19 00:30 has everything to do with memory mapps 2008-08-19 00:30 in userspace 2008-08-19 00:30 yeah, I'm retarded about this stuff, looking at a latency_trace now to see why the reschedule is taking so long 2008-08-19 00:30 btw, stay away from things like bit spins 2008-08-19 00:31 well, we will get to locking questions pretty soon 2008-08-19 00:31 bit spin... ok 2008-08-19 00:31 talk to rostedt if you have any unclarity about that 2008-08-19 00:31 always was suspicious about that 2008-08-19 00:31 lock_page is a bit spin 2008-08-19 00:31 that is used heavily 2008-08-19 00:31 but the current rwlock implementation sort of a miracle 2008-08-19 00:31 really readly really heavily 2008-08-19 00:31 really good work done by rostedt 2008-08-19 00:31 nice 2008-08-19 00:32 anything that's atomic is f-ed in -rt 2008-08-19 00:32 rwspinlock, right? 2008-08-19 00:32 make sure that you don't those locks for that long 2008-08-19 00:32 I'm pretty good about that 2008-08-19 00:32 only things like timers and the scheduler rq turn off interrupts and rescheduling for relatively long periods of time 2008-08-19 00:32 usually just take a spin lock long enough to get some other synchronizer set up 2008-08-19 00:33 all of that has been type redefined to be backed by a variant of the rtmutex 2008-08-19 00:33 so things like spinlocks are actually mutexes with the ability to sleep across BKL and still have it be persistently held to maintain correctness 2008-08-19 00:33 I'm wondering if I should get some multithreading happening in the userspace code 2008-08-19 00:33 semantic corrrectness 2008-08-19 00:33 get the locks at least partially sorted in userspace 2008-08-19 00:33 using futexes 2008-08-19 00:33 yeah, that might be useful for a mock up 2008-08-19 00:34 ACTION needs to get back to work 2008-08-19 00:34 the alternative is to skip that and just do that part in the kernel port 2008-08-19 00:34 btw, one of the Coverity owners is a Goth 2008-08-19 00:34 and a Stanford CSE professor 2008-08-19 00:34 they hang out near goog in mtv 2008-08-19 00:35 that was on the hiring committed for Sebastian Thrum (sp?) Grand Challenge winner 2008-08-19 00:35 dawson somebody 2008-08-19 00:35 engler 2008-08-19 00:35 nice dude, I gave him Burning Man advice a year ago :) 2008-08-19 00:35 had a good time 2008-08-19 00:35 hiring? 2008-08-19 00:35 what kind of advice does one need for burning man? 2008-08-19 00:36 hiring committee for Stanford CSS 2008-08-19 00:36 CSE department 2008-08-19 00:36 "watch out for the brown tabs" 2008-08-19 00:36 flips: how to have a good time what to look out for, etc... 2008-08-19 00:36 haha 2008-08-19 00:36 floppy naked chicks 2008-08-19 00:36 on bikes 2008-08-19 00:36 sounds, um, athletic 2008-08-19 00:37 well btree leaf ops are functioning ok 2008-08-19 00:37 one issue: inserting keys in sorted order results in many half full leaves 2008-08-19 00:37 because after a leaf is split it never gets inserted into again 2008-08-19 00:38 there must be something clever to do about that 2008-08-19 00:38 ok a node in the b-tree represents a file right ? 2008-08-19 00:38 and you put the versioning information at that node ? 2008-08-19 00:38 some btrees are inode table blocks, some are file indexes 2008-08-19 00:38 how are indirect blocks dumped into that ? 2008-08-19 00:38 a leaf in a btree gets the versioned pointers 2008-08-19 00:38 the btree is the indirect block stuff 2008-08-19 00:39 oh shit, now I get it 2008-08-19 00:39 that's what I was wondering about 2008-08-19 00:39 so the time space trade off is really all about the b-tree and the metadata shoved into it 2008-08-19 00:39 two levels of trees 1) inode table 2) file index 2008-08-19 00:39 is that a correct understanding ? or am I just lost ? 2008-08-19 00:40 btrees are fairly efficient space wise 2008-08-19 00:40 not as efficient as a classic ufs radix tree for an index 2008-08-19 00:40 is my articulation accurate regarding your FS ? 2008-08-19 00:40 yes 2008-08-19 00:40 I get it 2008-08-19 00:40 fuck, wow 2008-08-19 00:40 I didn't at our conversation, but I do now after talking to you and reading the posts 2008-08-19 00:40 it's more efficient to have a bunch of versioned pointers at the leaves of btrees than to be constantly rewrite tree nodes 2008-08-19 00:40 in theory 2008-08-19 00:41 yeah, you'll be able to do all sorts of funky things with it 2008-08-19 00:41 probably 2008-08-19 00:41 there was an OLS paper that talked about something similar actually 2008-08-19 00:41 2005 2008-08-19 00:41 2006 2008-08-19 00:41 would be interesting to see 2008-08-19 00:41 usign some kind of things like what you're talking about but to do DB kind of stuff with file metadata 2008-08-19 00:41 I didn't get the proceedings that year 2008-08-19 00:42 you could take a jpg or something and have a different header or something like that 2008-08-19 00:42 it should be online regardless 2008-08-19 00:42 heh 2008-08-19 00:42 well you could use versioning for that 2008-08-19 00:42 which potentially a powerful thing 2008-08-19 00:42 yes 2008-08-19 00:42 but I'm being fairly unimagination and just using it to implement posix and versioning 2008-08-19 00:42 yeah, I just got your idea, I'm half blow away by it 2008-08-19 00:42 good night's work then 2008-08-19 00:43 blown 2008-08-19 00:43 holy shit 2008-08-19 00:43 this could potentially smoke zfs since it's so rigid 2008-08-19 00:43 you can do all sorts of fucking things with those b-tree nodes 2008-08-19 00:43 am I right ? 2008-08-19 00:45 right 2008-08-19 00:45 it's about one zillion times more compact than zfs 2008-08-19 00:45 yes, wow 2008-08-19 00:45 it's brilliant 2008-08-19 00:48 there are some interesting things being implemented in the inode table leaves 2008-08-19 00:48 the file leaves aren't going to get much fancier 2008-08-19 00:48 they're already pretty darn fancy 2008-08-19 00:49 see dleaf.c 2008-08-19 00:49 insane 2008-08-19 00:49 ok, have to go do work 2008-08-19 00:49 later 2008-08-19 00:49 good luck 2008-08-19 00:49 bye 2008-08-19 01:01 shapor, there's another bug 2008-08-19 01:02 the unit test adds a tree level and should not 2008-08-19 01:08 another bug: some buffers not getting released 2008-08-19 01:38 bogus buffer counts are gone 2008-08-19 01:38 now about that bogus level add 2008-08-19 01:39 sb->entries_per_node wasn't set 2008-08-19 02:11 flips: how are you dealing with concurrency issues with b-tree access ? 2008-08-19 02:11 you'll be doing a lot of reads to that tree and it's got to be able to do it quickly 2008-08-19 02:12 start with a single btree mutex then make it more granular 2008-08-19 02:12 when probing, drop the lock on the level above each time it goes deeper 2008-08-19 02:12 so just the leaf ends up locked 2008-08-19 02:13 if there's a better idea, whack me 2008-08-19 02:13 have you thought about using rcu instead for the read-sides ? 2008-08-19 02:14 what about write coherency in that tree across some kind of atomic sync ? 2008-08-19 02:14 yes I have 2008-08-19 02:14 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-08-19 02:14 not really deeply though 2008-08-19 02:14 it's not a great rcu candidate 2008-08-19 02:14 the granularity issues are tricky and you could be stuck with a contention issue accessing that tree 2008-08-19 02:14 writing has to be really efficient too 2008-08-19 02:15 and rcu pukes pretty badly for writing 2008-08-19 02:15 yeah, I know 2008-08-19 02:15 I think the contention will be pretty good, provided the locks are pushed down the tree 2008-08-19 02:15 I have another trick too 2008-08-19 02:15 cursors 2008-08-19 02:15 a cursor is a probe path into the btree that isn't released 2008-08-19 02:16 you can't have a top level lock or else you'll run into things like the radix tree stuff for the page cache right ? 2008-08-19 02:16 to change the higher level blocks that a cursor owns you have to get everybody to release their cursor 2008-08-19 02:16 and limit yourself to about 2.5 processors for scalability 2008-08-19 02:16 maybe you can push this down to the versioning pointers themselves 2008-08-19 02:16 there won't be a lock inversion with radix tree 2008-08-19 02:17 radix tree lock is taken after btree lock 2008-08-19 02:17 it's not about inversion but contention and cache issues 2008-08-19 02:17 locks are ordered root to leaf in the btree 2008-08-19 02:17 the idea is not to access the root very often 2008-08-19 02:17 that is what the cursors do 2008-08-19 02:17 well, I'd expect top-level locks to really be hammered hard 2008-08-19 02:17 ok 2008-08-19 02:17 what about per cpu locality instead ? 2008-08-19 02:18 it needs to be stated more precisely 2008-08-19 02:18 so you can rip it apart ;-) 2008-08-19 02:18 localize things on an inode level or something like that with SLAB support 2008-08-19 02:18 one cursor per cpu would be nice 2008-08-19 02:18 flips: just trying to help you think it out, not to rip per se, I want you to succeed 2008-08-19 02:18 top level blocks will only change rarely and are possibly rcu candidates 2008-08-19 02:18 I want everybody to succeed :) 2008-08-19 02:18 :-) 2008-08-19 02:19 you might like to talk to peterz about some of these issues 2008-08-19 02:19 he's a tree concurrency expert 2008-08-19 02:19 ok, one concept about these cursors is, you can lop the top levels away from the cursor, so only the deeper levels hold locks 2008-08-19 02:19 then when you need to advance the cursor or something the top level locks are retaken... temporarily 2008-08-19 02:20 yes, peterz would be good 2008-08-19 02:20 I think I ought to port to kernel early and deal with the locking there 2008-08-19 02:20 instead of prototyping that in usespace 2008-08-19 02:20 you see how the locking works in ufs style file indexes? 2008-08-19 02:20 it's cute 2008-08-19 02:20 there is none 2008-08-19 02:21 property of the ind/dind/tind layout 2008-08-19 02:22 flips: you also need to think about file duping 2008-08-19 02:22 ? 2008-08-19 02:22 particularly de-duping 2008-08-19 02:22 not sure what you mean 2008-08-19 02:22 using a sha1 hash to make sure a file's contents are the same and aren't replicated 2008-08-19 02:22 so just having a pointer to it will do 2008-08-19 02:23 say for backing up a Windows volume and not recopying every fucking .dll constant in the system 2008-08-19 02:23 and other immutable files 2008-08-19 02:23 oh right 2008-08-19 02:23 just something to think about 2008-08-19 02:23 yes 2008-08-19 02:24 also possible to handle that at the volume level 2008-08-19 02:26 well, that's got to be handled in the b-tree as well I'd think since it's your only metadata structure that I know of 2008-08-19 02:27 the volume manager can pretend it's giving different blocks to the filesystem when they are actually the same 2008-08-19 02:28 then probably you need reference counting at some level 2008-08-19 02:28 venti and stuff like that 2008-08-19 02:31 well, the metadata grows as you add more functionality, so packing becomes important 2008-08-19 02:32 if I'm coming up with uninterestng things please tell me and I'll shut up 2008-08-19 02:32 you aren't suggesting looking for identical metadata blocks? 2008-08-19 02:33 but having something that can also vary the flatness of a particular file would also be useful like for video applications 2008-08-19 02:33 you could represent discontinguous spans using a special indirect block or something and describe the spans using an extent (?) 2008-08-19 02:33 indirect pointer I mean 2008-08-19 02:33 er, no, block 2008-08-19 02:34 flips: I'm suggesting what ever will work 2008-08-19 02:34 there are also to be extents 2008-08-19 02:34 extents will really flatten things 2008-08-19 02:34 so spans could be represented by an extent right ? 2008-08-19 02:34 ok, good 2008-08-19 02:34 yes 2008-08-19 02:34 sparse extents too 2008-08-19 02:34 good 2008-08-19 02:35 ok, am I raising interesting points or not ? 2008-08-19 02:35 oh yes 2008-08-19 02:35 especially the locking 2008-08-19 02:35 I need to make a specific proposal 2008-08-19 02:35 starting from easy and moving to efficient 2008-08-19 02:35 well, the b-tree thing is so obvious yet so powerful I'm surprised that somebody hasn't tried this already 2008-08-19 02:36 btrfs is btrees, so is zfs 2008-08-19 02:36 but versioning at the leaves is new 2008-08-19 02:36 yeah, but you're using it in a novell way which is why it's interesting to me 2008-08-19 02:36 ok 2008-08-19 02:36 what seems novelle to you? 2008-08-19 02:37 novel 2008-08-19 02:37 a problem with a single big b-tree I would think might be aging elements in memory so that certain frequently used things will be in core for use, like for checking the integrity of a volume without having to load the same indirect pointer again and again 2008-08-19 02:37 flips: using a b-tree generically for all sorts of things 2008-08-19 02:38 generic btrees are new too, right 2008-08-19 02:38 I also don't know as much as you about file systems so my comment could be out of ignorance 2008-08-19 02:38 flips: I'm interested in the power of generic b-trees for all sorts of metadata 2008-08-19 02:38 the buffer cache blocks are lru's 2008-08-19 02:38 lru'd 2008-08-19 02:39 clean, old ones get evicted 2008-08-19 02:39 dirty ones have to be cleaned regularly, that is the atomic commit 2008-08-19 02:39 I will add the third kind of btree probably tomorrow 2008-08-19 02:40 actually, I already added a third kind, the unit test implements a new btree just for testing 2008-08-19 02:40 and to demo what you have to do to specialize the btree 2008-08-19 02:40 what about sensitivity to things like an inode versus indirect versus lower level indirect blocks ? 2008-08-19 02:41 same for all other kinds of metadata 2008-08-19 02:41 there needs to be a kind of ordering or something like that I'd expect 2008-08-19 02:41 for commit? 2008-08-19 02:42 like an NFS use of a volume might be different for a Samba 2008-08-19 02:42 flips: for general reading 2008-08-19 02:42 ...and needs different kinds of metadata loaded and persistent in different ways 2008-08-19 02:42 this why I'm suspicious about the Linux page cache 2008-08-19 02:43 everything is handled the same way 2008-08-19 02:43 in the page cache 2008-08-19 02:43 the aging seems overly simplistic 2008-08-19 02:43 probably is 2008-08-19 02:43 linux kind of sucks there 2008-08-19 02:43 yeah, I've noticed 2008-08-19 02:43 somebody measured and found our pageout performs worse than random 2008-08-19 02:44 bad 2008-08-19 02:44 well my wife is heading to to tomorrow 2008-08-19 02:45 ok 2008-08-19 02:45 and I will drive to the ariport 2008-08-19 02:45 so night then right 2008-08-19 02:45 ? 2008-08-19 02:45 I'll be up still for a few more hours 2008-08-19 02:45 continue anytime ok? 2008-08-19 02:45 sure, I hope it was a useful conversation 2008-08-19 02:45 night for me 2008-08-19 02:45 it was 2008-08-19 02:45 night 2008-08-19 02:45 locking is getting imminent 2008-08-19 02:45 bye 2008-08-19 02:50 you should also think about how to cluster related data together in the b-tree for contigous write allocatin 2008-08-19 02:50 the block allocator is a bitch 2008-08-19 02:50 thinking every much about that 2008-08-19 02:51 I will post some thoughts pretty soon 2008-08-19 02:51 ok, just hope that i'm relevant about this :) 2008-08-19 02:51 inode number targetting is a big part of it 2008-08-19 02:51 oh yes 2008-08-19 02:51 rotating media still rules the wold 2008-08-19 02:51 world 2008-08-19 02:51 because different kinds of metadata need to be treated differently 2008-08-19 02:52 which could be a drawback of having a big b-tree manage all of this 2008-08-19 02:52 I guess you can always dump a shit load of ram into your system as well 2008-08-19 02:52 the allocator will try to places inode table blocks near the directories that link them (note impossibility with hard links) and data blocks near the inode table blocks 2008-08-19 02:52 also impossible in general 2008-08-19 02:53 what about what about relate indirect blocks ? 2008-08-19 02:53 and allocation with regards to versioning pointers and that information ? 2008-08-19 02:53 meaning higher level btree blocks 2008-08-19 02:53 as long as I'm asking good questions, I'll not feel like a fucking dork 2008-08-19 02:53 allocation target needs to be derived from the allocation target of the data blocks 2008-08-19 02:54 versioning makes allocation much harder 2008-08-19 02:54 yes 2008-08-19 02:54 very much so 2008-08-19 02:54 because you basically have to store lots of the data in the same place 2008-08-19 02:54 so you'll have to have an upper bounds on the fs for doing this allocation efficiently 2008-08-19 02:54 that is where the idea of generating functions for allocation comes in 2008-08-19 02:54 like a quadratic hash 2008-08-19 02:55 otherwise you'll be running into collisions 2008-08-19 02:55 there will be massive collisions 2008-08-19 02:55 I am aiming to collide elegantly 2008-08-19 02:55 what about allocation maps in the versioning system ? self contained in the b-tree itself ? 2008-08-19 02:55 that is a cool thing about versioned pointers 2008-08-19 02:55 if it's done on per volume basis, it could be a lot of replication 2008-08-19 02:55 you can tell from the versioned pointers what blocks are free 2008-08-19 02:56 right, so it's unified into the algorithm right ? 2008-08-19 02:56 there is just one global free tree for the whole filesystem 2008-08-19 02:56 knowing when to free a block is part of the versioning algorithm, yes 2008-08-19 02:56 it's pretty subtle 2008-08-19 02:56 well, what about fragmentation of that data ? 2008-08-19 02:56 about the hardest part actually 2008-08-19 02:56 yes, versioning can fragment stuff 2008-08-19 02:56 you'd generally like to have that easily accessible 2008-08-19 02:56 think of a mysql database with snapshots every 5 minutes 2008-08-19 02:56 wham 2008-08-19 02:57 this conversation is logged right ? 2008-08-19 02:57 I believe so 2008-08-19 02:57 ok, just so that folks can ponder this stuff and come up with answers 2008-08-19 02:57 see tux3bot up them 2008-08-19 02:57 well, the allocation map is a bitch 2008-08-19 02:58 true 2008-08-19 02:58 the bitmap thing is pretty cute 2008-08-19 02:58 your read performance and friends are really tightly connected to how fast you can do a lookup in a b-tree 2008-08-19 02:58 you just reminded me, I can't have the allocation bitmap in my inode table 2008-08-19 02:58 it's global to multiple volumes 2008-08-19 02:59 one crude trick: cache the root of the btree 2008-08-19 02:59 and the 1st level for good measure 2008-08-19 02:59 well, replicated it 2008-08-19 02:59 branching factor is 2^8 2008-08-19 02:59 say there are 10 million inodes 2008-08-19 03:00 packed 32/block 2008-08-19 03:00 I think you should think about per CPU-ification straight up initially as apart of the design 2008-08-19 03:00 so that you avoid these issues 2008-08-19 03:00 2^18 blocks about 2008-08-19 03:00 you might have to push it down to an inode level or something and replicate all of the volume bits above it 2008-08-19 03:00 which is 3 btree levels 2008-08-19 03:01 yes, that is the right way to think about it 2008-08-19 03:01 no bouncing 2008-08-19 03:01 yeah, talking to matt about it will help us 2008-08-19 03:01 it's nearly 2 levels 2008-08-19 03:01 worth trying to make it 2 levels 2008-08-19 03:01 er, you. I'm avoiding work right now ;) 2008-08-19 03:01 then cache the root 2008-08-19 03:02 I'll stay up later to compensate 2008-08-19 03:02 that's one probe to get to the inode 2008-08-19 03:02 flips: I think it's critical to think about how you're going to organize the metadata, what for specific use at a specific time 2008-08-19 03:02 the versioning pointer stuff is really potentially powerful 2008-08-19 03:02 been thought about a lot 2008-08-19 03:03 I'm thinking about how to pack the btree nodes better now 2008-08-19 03:03 because caching this shit properly is a major bitch 2008-08-19 03:03 yes 2008-08-19 03:04 right now it's big an homogenous 2008-08-19 03:04 an=and 2008-08-19 03:04 which sounds like shitty cache performance 2008-08-19 03:05 which means that you have to think about these things straight up 2008-08-19 03:05 before trying to really fully implement it 2008-08-19 03:05 it's not homogenous 2008-08-19 03:05 inode table blocks try to have related inodes 2008-08-19 03:06 blocks ? 2008-08-19 03:06 directory blocks have temporally related entries 2008-08-19 03:06 leaves of the inode table btree 2008-08-19 03:06 have more than one inode per blocks 2008-08-19 03:28 -!- pgquiles(~pgquiles@246.Red-81-37-88.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 03:29 getting sleepy 2008-08-19 03:29 night 2008-08-19 03:29 night 2008-08-19 03:29 you're up late as well, wow 2008-08-19 04:51 -!- juancarlos(~juancarlo@33.Red-83-53-239.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 04:51 -!- juancarlos(~juancarlo@33.Red-83-53-239.dynamicIP.rima-tde.net) has left #tux3 2008-08-19 10:13 -!- pgquiles_(~pgquiles@154.Red-83-33-145.dynamicIP.rima-tde.net) has joined #tux3 2008-08-19 11:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-19 13:54 flips: you there ? 2008-08-19 13:55 have you thought about cluster failover in your system yet ? 2008-08-19 13:55 yes 2008-08-19 13:55 yes 2008-08-19 13:55 a little 2008-08-19 13:55 what do you think about union FS and btrfs ? 2008-08-19 13:55 mostly about how atomic commit will work on a cluster 2008-08-19 13:55 neither has much to do with a cluster 2008-08-19 13:56 perhaps you are talking about failing over the underlying volume? 2008-08-19 13:56 yes 2008-08-19 13:57 well, something like paired nodes taking over when one or the other fails 2008-08-19 13:57 this would be in a kind of grid computing environment 2008-08-19 13:57 cluster lite 2008-08-19 13:58 what's your opinion aobut btrfs ? 2008-08-19 13:58 about 2008-08-19 13:58 I was thinking about the extend tux3 to be a clusterfs issue 2008-08-19 13:58 btrfs in general? I wish them good luck 2008-08-19 13:58 get stable and be better than zfs 2008-08-19 13:58 folks seem to be interested in it and there's increasing engineering effort going into it 2008-08-19 13:58 but it has the same design flaw as zfs 2008-08-19 13:59 I wasn't impressed by it when I looked at it actually 2008-08-19 13:59 mashes the lvm together with the filesystem, bad 2008-08-19 13:59 me neither 2008-08-19 13:59 oh 2008-08-19 13:59 I meant zfs 2008-08-19 13:59 btrfs is a zfs knockoff, and I think zfs kind of sucks 2008-08-19 13:59 zfs is slow for one thing 2008-08-19 14:05 btree algorithms sure got solid fast once I implemented the unit test 2008-08-19 14:05 now let's try the shiny new advance method 2008-08-19 14:08 I just wasnt impressed by it 2008-08-19 14:08 it's like a bad knock off of WAFL 2008-08-19 14:09 without any of the coolness of that system 2008-08-19 14:09 maybe I'm wrong, we'll find out 2008-08-19 14:09 btrfs is getting a lot of attention and resources right now so we'll see 2008-08-19 14:11 I don't think you're far off the mark 2008-08-19 14:11 spent some time in the code myself 2008-08-19 14:12 tux3 file index btrees just got better 2008-08-19 14:12 two line hack 2008-08-19 14:12 improves average leaf fullness from 50% to 100% 2008-08-19 14:12 nice 2008-08-19 14:13 the thing that I wondered about regarding btrfs is that it's pulling all sorts of things together, but I just don't understand why and to what ends 2008-08-19 14:13 doesn't seem to break anything either. I'd appreciate comment on the post I just did on tux3 though 2008-08-19 14:13 that's the main problem I have with it 2008-08-19 14:13 right 2008-08-19 14:13 as well as it kind of ignoring all of the intricacies of how complicated a COW FS is 2008-08-19 14:13 it's really dumb to do that stuff with the volume manager when it isn't necessary 2008-08-19 14:14 I already figured out how to do the redudant metadata thing they are obsessed with, without violating the lvm boundary 2008-08-19 14:14 don't know much about the lvm, but it seems like a bunch of grab bag items thrown together for some unclear reasons 2008-08-19 14:15 you should comment on some of the intricacies 2008-08-19 14:15 one of them is certainly allocation 2008-08-19 14:15 like lvm isn't known to handle metadata specifically, so I don't know about how they're going to pull that together 2008-08-19 14:15 they don't do it in the lvm 2008-08-19 14:15 or anything really with regards to lvm 2008-08-19 14:16 they make multiple copies of each metadata block and have multiple pointers to them 2008-08-19 14:16 that is just dumb 2008-08-19 14:16 have one pointer to the metadata block and make the block redundant at the lvm level 2008-08-19 14:16 duh 2008-08-19 14:16 it's not bad if it's done for a specific reason to solve a particular problem with metadata 2008-08-19 14:16 it's a general fear of bad blocks 2008-08-19 14:16 or disks going bad 2008-08-19 14:17 it's a really top heavy solution 2008-08-19 14:17 so let the RAID layer handle it ? 2008-08-19 14:17 yes 2008-08-19 14:17 hmmm 2008-08-19 14:17 well, hard to say 2008-08-19 14:17 easy 2008-08-19 14:17 yeah, that makes sense, but I'm wondering where this will all go to 2008-08-19 14:17 have regions of different redundancy 2008-08-19 14:18 25% redundant for data, 200% for metadata 2008-08-19 14:18 interleave the regions, the filesystem knows which have which level of redundancy and sets allocation targets accordingly 2008-08-19 14:19 if necessary, have to lvm remap some regions to achieve higher or lower redundancy levels 2008-08-19 14:19 it would almost certainly be good enough just to let 1% of the volume be 200% redudant 2008-08-19 14:20 and distribute that evenly through the volume 2008-08-19 14:20 how's things going today ? was our discussion useful in pointing out problems last night ? 2008-08-19 14:20 was good 2008-08-19 14:20 reviewing a little now 2008-08-19 14:21 yes, the locking stuff 2008-08-19 14:21 nee to make a coherent proposal on the list 2008-08-19 14:21 don't know, maybe btrfs will win and I'm wrong about my skepticism 2008-08-19 14:21 also need to make a coherent proposal on atomic commit, get the basics working in user space 2008-08-19 14:22 linux filesystem projects do tend to keep moving along 2008-08-19 14:22 btrfs has some good helpers 2008-08-19 14:22 yeah, maybe they'll win 2008-08-19 14:22 though most of the coding still seems to fall on chris 2008-08-19 14:23 it's btrfs vs zfs, not vs tux3 imho 2008-08-19 14:23 I think btrfs has a good chance against zfs 2008-08-19 14:23 but my experience is that the linux page cache is inadequate for enterprise level filers 2008-08-19 14:23 somewhat true 2008-08-19 14:23 you really need something different than that 2008-08-19 14:23 the radix tree stuff is pretty good 2008-08-19 14:23 something really particular to buffers because of the mirroring logic and stuff 2008-08-19 14:24 buffer handling needs a big fix, that is true 2008-08-19 14:24 buffers have to be individually marked so that you know that it's been replicated properly, etc... 2008-08-19 14:24 tux3 worries about taht 2008-08-19 14:24 oh, mirroring 2008-08-19 14:24 whether the buffers you're copying and indirect blocks are valid for the copy after online checking 2008-08-19 14:24 not a good way to mirror 2008-08-19 14:25 delta mirroring is the right way to go, otherwise you probably just want raid1 2008-08-19 14:26 to delta mirror, you don't try to copy indirect blocks, just leaf data 2008-08-19 14:26 let the destination worry about setting up the indirect blocks 2008-08-19 14:27 got some 30 level btress happening ;-) 2008-08-19 14:27 by cutting the leafs down to 7 elements per 2008-08-19 14:28 beauty is when all the smoke clears and every buffer has zero use count 2008-08-19 14:35 -!- vandenoever(~vandenoev@ip5657eb5b.direct-adsl.nl) has joined #tux3 2008-08-19 14:35 good evening 2008-08-19 14:35 hi 2008-08-19 14:35 vandenoever: hi 2008-08-19 14:35 hi flips, i hear you rule this realm 2008-08-19 14:35 flips: vandenoever is the guy behind strigi 2008-08-19 14:35 vandenoever: flips is the guy behind tux3 2008-08-19 14:35 let the party begin! 2008-08-19 14:35 rule would be a bit of an exaggeration 2008-08-19 14:35 :-) 2008-08-19 14:36 friend of yours pgquiles_? 2008-08-19 14:36 right 2008-08-19 14:36 flips: you're the main attraction 2008-08-19 14:36 dunno, shapor is kind of cute 2008-08-19 14:36 ACTION ducks 2008-08-19 14:36 strigi looks very cool 2008-08-19 14:36 and I am a huge kde fan 2008-08-19 14:37 flips: i'd have to go by glyhp curves on that , which is rather hard 2008-08-19 14:37 flips: that 's a good start :-) 2008-08-19 14:37 so i was wondering if at some point there should be indexes as part of the filesystem 2008-08-19 14:37 I used to use glimpse a lot, just for lxr 2008-08-19 14:37 it never got gree 2008-08-19 14:38 then htdig came along 2008-08-19 14:38 which kde still uses for docs 2008-08-19 14:38 the shame! 2008-08-19 14:38 well, change the world step by step 2008-08-19 14:38 I suppose strigi beats it in every way? 2008-08-19 14:38 well 2008-08-19 14:38 sort of 2008-08-19 14:38 I would like to solve the problem of accurately maintaining an index 2008-08-19 14:39 without necessarily building it into the fs 2008-08-19 14:39 flips: that's the most urgent one 2008-08-19 14:39 flips: yes, let's not overdo it 2008-08-19 14:39 ddnotify ;-) 2008-08-19 14:39 you just invented that? nice 2008-08-19 14:39 nope 2008-08-19 14:39 I invented a bunch of other ddthings 2008-08-19 14:40 and ddlink might be really useful 2008-08-19 14:40 i mean: you just invented the name 2008-08-19 14:40 right 2008-08-19 14:40 ddlink is cool 2008-08-19 14:40 it is a tight two way coupling between kernel and userspace 2008-08-19 14:40 suitable for tasks like sending change notifies 2008-08-19 14:40 never heard of it ... 2008-08-19 14:41 google 2008-08-19 14:41 "ddlink kernel" 2008-08-19 14:41 yes 2008-08-19 14:41 it doesn't have a high profile 2008-08-19 14:41 ddlink phillips 2008-08-19 14:41 ACTION finds a pdf about instant startup 2008-08-19 14:41 even then... 2008-08-19 14:41 bleah 2008-08-19 14:41 just a sec 2008-08-19 14:41 An alternative interface to device mapper 2008-08-19 14:42 yes 2008-08-19 14:42 In more detail: ddlink is a generic pipe-like interface for controlling 2008-08-19 14:42 device drivers. 2008-08-19 14:42 hmm 2008-08-19 14:42 show you how much the world cares about that ;-) 2008-08-19 14:43 anything, the thing is, you can poll on a ddlink 2008-08-19 14:43 and it can send you, say, filesystem specific change notifications 2008-08-19 14:43 but i dont want to poll 2008-08-19 14:43 what would you like? 2008-08-19 14:43 oh 2008-08-19 14:43 not on a given inode 2008-08-19 14:43 a stream of file changes to read from 2008-08-19 14:43 that's right 2008-08-19 14:43 ddlink does that 2008-08-19 14:44 ok, cool 2008-08-19 14:44 poll just lets you read it efficiently 2008-08-19 14:44 filtered for user rights? 2008-08-19 14:44 that's a detail of the ddlink instance 2008-08-19 14:44 but yes 2008-08-19 14:44 obeys access rules 2008-08-19 14:44 by default 2008-08-19 14:44 so i go and say: give me a pipe to read file changes on /dev/sda3 2008-08-19 14:44 ? 2008-08-19 14:45 exactly 2008-08-19 14:45 and this is a kernel module? how is this exposed? 2008-08-19 14:45 it could also be on a superblock 2008-08-19 14:45 it is a kernel library 2008-08-19 14:45 a module instantiates a ddlink with a few methods 2008-08-19 14:45 ok, so no userspace api yet 2008-08-19 14:46 sure 2008-08-19 14:46 it's a normal pipish kind of api 2008-08-19 14:46 good 2008-08-19 14:46 posted some minimal demos 2008-08-19 14:46 and have much nicer ones 2008-08-19 14:46 so that's step 1 2008-08-19 14:46 this idea has been in kernel for a long time 2008-08-19 14:46 here's problem 2 2008-08-19 14:46 see rpc_pipefs 2008-08-19 14:47 flips: but not generally part of vfs? 2008-08-19 14:47 not at the level, doesn't need to be 2008-08-19 14:47 it just uses the vfs to do its thing 2008-08-19 14:47 so let's assume this would work (i'll read up) 2008-08-19 14:48 see this scenario: kernel boots 2008-08-19 14:48 fs is mounted, X is started 2008-08-19 14:48 user logs in 2008-08-19 14:48 files are changed 2008-08-19 14:48 desktop start 2008-08-19 14:48 desktop search starts 2008-08-19 14:48 ddlink is opened 2008-08-19 14:48 unf. we have missed file changes at this point 2008-08-19 14:49 i'd like the indexer to say to the filesystem: 2008-08-19 14:49 "the last change i got from you was N. what has happened since?" 2008-08-19 14:49 so fs needs a circular log 2008-08-19 14:49 good, no problem 2008-08-19 14:50 ddlink maintains an arbitrarily long queue 2008-08-19 14:50 waiting for someone to come along and slurp it up 2008-08-19 14:50 but not on disk, right? 2008-08-19 14:50 doesn't make the fs or anything wait synchronously either 2008-08-19 14:50 no 2008-08-19 14:50 memory 2008-08-19 14:50 because the same happens on shutdown 2008-08-19 14:50 ok, you want something on disk too? 2008-08-19 14:50 sounds reasonable 2008-08-19 14:50 or when indexer crashes 2008-08-19 14:51 you don't want kernel to buffer forever, right? 2008-08-19 14:51 or when user logs in without starting the indexer 2008-08-19 14:51 no, should be a reasonable limit 2008-08-19 14:51 because we can always do a full scan 2008-08-19 14:51 fs can say: " i dont remember all of that" and indexer does a full scan 2008-08-19 14:52 slight security problem: N should not be sequence 2008-08-19 14:52 ? 2008-08-19 14:52 anyway this is all a can do 2008-08-19 14:53 intruder could know how much was written by monitoring N 2008-08-19 14:53 even having the filesystem buffer the changes on disk 2008-08-19 14:53 nothing hard about it 2008-08-19 14:53 no, just has to be in the design 2008-08-19 14:53 which is why pgquiles_ pushed me here 2008-08-19 14:53 he said: go, go, flips is designing, we can add cruft! 2008-08-19 14:54 just kidding, but he did push me here because this is a good point to take this stuff into account 2008-08-19 14:54 :-) 2008-08-19 14:55 flips: the btrfs folks have a more concurrent b-tree implementation now 2008-08-19 14:55 according to their announcement 2008-08-19 14:55 ok, convince me that the events actually have to be buffered on disk as opposed to in memory 2008-08-19 14:55 I think I am closed to convinced 2008-08-19 14:55 but I bet you dillon has something to say about that with replications 2008-08-19 14:55 nice excuse to add that cruft to the design ;-) 2008-08-19 14:55 flips: it's a performance thing 2008-08-19 14:55 bh, I was aware of it 2008-08-19 14:55 ok 2008-08-19 14:55 if a user logs in, now the first thing that happens is that the indexer puts inotify watches everywhere 2008-08-19 14:56 bh, you're syncing up pretty fast 2008-08-19 14:56 or scans all dirs for changes 2008-08-19 14:56 yeah 2008-08-19 14:56 sucks 2008-08-19 14:56 I know 2008-08-19 14:56 thought about it 2008-08-19 14:56 with a cache on disk, indexer gets a short list of modified changes and is in sync 2008-08-19 14:56 yes 2008-08-19 14:56 that is the right way to go, I will make a design note 2008-08-19 14:57 eh syncing up on what ? current development on linux file systems ? 2008-08-19 14:57 ACTION dances 2008-08-19 14:57 just taking an interest largely because of your announcement 2008-08-19 14:57 change notification needs to be a first class citizen of a filesystem, you showed that 2008-08-19 14:57 flips: deliver us from inotify! ;-) 2008-08-19 14:57 I will do my best 2008-08-19 14:57 don't want to be negative, but I've been kind of down about Linux fs development overall 2008-08-19 14:57 this buffering could possibly be done at the vfs level too 2008-08-19 14:57 it just seems to scattered and disjointed 2008-08-19 14:57 only after gaining experience at the fs level 2008-08-19 14:58 bh, syncing up with btrfs facts 2008-08-19 14:58 flips: you mean vfs writes to a log file, so it works for al fses? 2008-08-19 14:58 most people just go on general impressions 2008-08-19 14:58 exactly 2008-08-19 14:58 but first some filesystem has to implement it and get it right 2008-08-19 14:58 before generalizing 2008-08-19 14:58 that would be even better, but log format would have to allow for sanity checking it 2008-08-19 14:58 and getting a mess like quota files 2008-08-19 14:59 yep 2008-08-19 14:59 obviously getting it in any fs is fine with me 2008-08-19 15:00 i was just wondering how this should be started 2008-08-19 15:00 a simple fuse with a change log could be used for designing 2008-08-19 15:00 starts with a design note I think 2008-08-19 15:00 uhuh 2008-08-19 15:00 well 2008-08-19 15:00 a bogus kernel module faking a ddlink would be good 2008-08-19 15:00 flips: i know, i'm cursing in the church of kernelspace 2008-08-19 15:01 you could do this: have two ddlinks 2008-08-19 15:01 you use one to feed fake filesystem behaviour into the kernel 2008-08-19 15:01 your index code uses the other 2008-08-19 15:01 as it would if the filesystem were generating the fake events 2008-08-19 15:01 my index code can make a ddlink? 2008-08-19 15:02 the module does 2008-08-19 15:02 by any method 2008-08-19 15:02 oki 2008-08-19 15:02 I currently favor ioctl for creating ddlinks 2008-08-19 15:03 for example, ioctl a file, the root of a fs or any other file 2008-08-19 15:03 to get your ddlink 2008-08-19 15:03 I use ioctl code 0xdd for that ;-) 2008-08-19 15:03 :-) 2008-08-19 15:05 flips: then that fd is a pipe from which to read the changes? 2008-08-19 15:05 yes 2008-08-19 15:05 the ioctl returns a fd 2008-08-19 15:05 boy, kernel programming almost sounds easy! 2008-08-19 15:06 this was pretty clean 2008-08-19 15:06 then invent a protocol 2008-08-19 15:06 right 2008-08-19 15:06 that part is fun 2008-08-19 15:06 too bad we cannot use inodes in the protocol 2008-08-19 15:06 I mostly just send structs over the pipe 2008-08-19 15:06 ? 2008-08-19 15:06 or can we 2008-08-19 15:07 you can use anything that positively identifies the change 2008-08-19 15:07 inode numbers would be good 2008-08-19 15:07 we need to tell the path and the type of change i guess 2008-08-19 15:07 much better than names I think 2008-08-19 15:07 can i map inode to path? 2008-08-19 15:07 yes 2008-08-19 15:07 the heavens open! 2008-08-19 15:07 really? how? 2008-08-19 15:07 you want to use some kind of handle for a directory, not a path I think 2008-08-19 15:08 path handling is crufty 2008-08-19 15:08 gets hard when the path changes asynchronously 2008-08-19 15:08 index uses urls as handles 2008-08-19 15:08 you would use the ddlink to ask the fs to tell you the name of an inode 2008-08-19 15:08 now 2008-08-19 15:08 of course 2008-08-19 15:08 there is a problem 2008-08-19 15:09 the inode can be multiply linked 2008-08-19 15:09 flips: ah ok, yes, that's possible, but i was not planning on talking ddlink, just to listen 2008-08-19 15:09 what you want is directory handles 2008-08-19 15:09 much linke openat etc 2008-08-19 15:09 much link 2008-08-19 15:09 much like 2008-08-19 15:10 directory handle + name 2008-08-19 15:10 instead of path/name 2008-08-19 15:10 then ask for directory name + parent handle till we reach root? 2008-08-19 15:10 right, that is always precisely defined 2008-08-19 15:10 unix semantics 2008-08-19 15:11 i see 2008-08-19 15:11 some notion of filesystem object would be cool 2008-08-19 15:11 an inode is a good object id 2008-08-19 15:11 the thing is, to decouple the object id from the name 2008-08-19 15:11 filesystem object it root for the ddlink module 2008-08-19 15:12 anyway, you're the expert there 2008-08-19 15:12 flips: for id we do use the path and we have currently no mechanism of transferring indexed information when moving a file 2008-08-19 15:12 so you could have a real id 2008-08-19 15:13 and map your current paths to a made up id 2008-08-19 15:13 but use the real, inode id if available 2008-08-19 15:13 we could but we'd have to change the entire indexer api 2008-08-19 15:14 do we need to do this to ensure that we are in sync? 2008-08-19 15:15 do it some time in the future 2008-08-19 15:15 i realize that inodes are more efficient in terms of moving and double linking 2008-08-19 15:15 it's just more accurate 2008-08-19 15:15 I think 2008-08-19 15:15 flips: what if a defrag tool comes along? 2008-08-19 15:15 at least, use the directory id's I think 2008-08-19 15:15 that should map to your stuff 2008-08-19 15:15 defraggers renumbering inodes? 2008-08-19 15:16 sure it's a danger 2008-08-19 15:16 but that should just look like a series of valid operations to you 2008-08-19 15:16 or what if user restores a backup and index was on another disk? 2008-08-19 15:16 you know you have it right when you can follow events through that maze 2008-08-19 15:16 then depending on type of restore, inodes might be different 2008-08-19 15:16 flips: it's still subject to implementation issues like everything, I couldn't predict the performance of either btrfs or tux3 until there was an implementation in place for testing 2008-08-19 15:16 true 2008-08-19 15:17 I'd tend to go for some kind of "meld" process to handle extreme events like that 2008-08-19 15:17 flips: it's an index so we can always rebuild it 2008-08-19 15:17 sounds like invalidating the whole index would be right in those cases 2008-08-19 15:17 right 2008-08-19 15:17 what we aim for is to be 99% sure that we dont need to do much work 2008-08-19 15:17 when starting up 2008-08-19 15:17 bh, you can't just assume that I'll make it kickass? ;-) 2008-08-19 15:18 yes 2008-08-19 15:18 I think I get it 2008-08-19 15:18 and to be able to tolerate startup + shutdown + startup where the indexer doesn't run for a whole cycle 2008-08-19 15:18 we can still have a 'do full scan' button, but it should not be needed 2008-08-19 15:18 and just picks up as if it did 2008-08-19 15:18 right 2008-08-19 15:19 I think I have a pretty clear picture 2008-08-19 15:19 will dust off ddsnap code 2008-08-19 15:19 ddlink I mean 2008-08-19 15:19 and refresh to current 2008-08-19 15:19 ddlink lives as a patch? 2008-08-19 15:19 flips: I hope so, but man the rumors about you and stuff give me doubt about you 2008-08-19 15:19 especially those freaky roller blades and stuff 2008-08-19 15:19 and funky hat 2008-08-19 15:20 weird friends 2008-08-19 15:20 http://phunq.net/ddtree 2008-08-19 15:20 I'll make it a patch 2008-08-19 15:20 ACTION giggles 2008-08-19 15:20 bh: oh my, now i have images of the 90s in my head 2008-08-19 15:20 bh, I take off the roller blades to debug 2008-08-19 15:20 and put them on your head to keep out the voice from outer spaec ? 2008-08-19 15:21 space ? 2008-08-19 15:21 :) 2008-08-19 15:21 my daughter likes to put them on 2008-08-19 15:21 no matter what I do, I can't keep the voices out 2008-08-19 15:21 flips: is ddlink tux3 specific? 2008-08-19 15:21 right now they're telling me to test the btree advance ;-) 2008-08-19 15:22 v, not at all 2008-08-19 15:22 completely generic, I find new places to use it all the time 2008-08-19 15:22 it will be an integral part of lvm3 2008-08-19 15:22 trond is even interested in changing nfs to use it 2008-08-19 15:23 it's cleaner than rpc_pipefs 2008-08-19 15:23 ACTION runs and hides 2008-08-19 15:24 when do you think you'll overrun btrfs ? :) 2008-08-19 15:24 christmas 2008-08-19 15:24 which christmas is left unspecified 2008-08-19 15:24 seriously ? 2008-08-19 15:24 :-D 2008-08-19 15:24 I was just joking actually 2008-08-19 15:24 I'm always serious ;-) 2008-08-19 15:24 ah, ok 2008-08-19 15:24 so before abolition of christianity and capitalism 2008-08-19 15:25 days before 2008-08-19 15:25 faster if coders send patches 2008-08-19 15:25 bh, you can still grab the glory of being contributor #3 2008-08-19 15:26 oh fuck 2008-08-19 15:26 :) 2008-08-19 15:26 I have this nasty schedule code/bug to work through 2008-08-19 15:26 scheduler 2008-08-19 15:26 and I'm kind of half clueless about what's going on 2008-08-19 15:26 bh, how about I do a quick kernel port and you make nice btree locking? 2008-08-19 15:27 should not be a huge time investment 2008-08-19 15:27 it's a hard problem I'm not sure what the best method is 2008-08-19 15:27 rcu on the upper nodes 2008-08-19 15:27 well, it depends 2008-08-19 15:27 hmmm 2008-08-19 15:27 needs to be cognizant of the atomic commit algorithm 2008-08-19 15:27 which I must crystallize first 2008-08-19 15:27 even spinning on the upper nodes would be find 2008-08-19 15:28 mutex on the deep nodes 2008-08-19 15:28 flips: i'm going to read up on ddlink and await the resurrection of it 2008-08-19 15:28 it's in the pipeline 2008-08-19 15:28 ddsetup is the example program, a clone of dmsetup 2008-08-19 15:28 but written in a fraction of the code 2008-08-19 15:28 flips: can you mail me when you get an update? then i can add another fs event backend to the indexer 2008-08-19 15:29 how about this: you describe your ask on tux3 mailing list 2008-08-19 15:29 then I respond by giving you something concrete? 2008-08-19 15:29 ok, will do so tomorrow, now ddlink bedtime reading and sleep 2008-08-19 15:29 cross post to your own ml 2008-08-19 15:30 there was some good discussion of it on lkml 2008-08-19 15:30 flips: how long ago? 2008-08-19 15:30 jon corbet asked me why not netlink 2008-08-19 15:30 and I showed why not pretty convincingly 2008-08-19 15:30 year or so 2008-08-19 15:30 was it on lwn.net if corbet was asking? 2008-08-19 15:30 lkml I think 2008-08-19 15:30 jon sometimes posts 2008-08-19 15:31 http://lkml.org/lkml/2008/3/5/327 2008-08-19 15:32 right 2008-08-19 15:32 a slightly exaggerated comparison 2008-08-19 15:32 but only slightly 2008-08-19 15:32 netlink really does suck 2008-08-19 15:32 for the press :-) 2008-08-19 15:32 right 2008-08-19 15:36 ok, good night! 2008-08-19 15:36 ACTION gets back to the advance test 2008-08-19 15:46 pop to level 1, 3 of 3 nodes 2008-08-19 15:46 [5815] devmap_blockio: read block dddddddddddddddd 2008-08-19 15:46 [5815] devmap_blockio: Failed assertion "dev->bits >= 9 && dev->fd" 2008-08-19 15:46 bugz ;-) 2008-08-19 15:46 ACTION is still reading the backlog, he ran away for (very late) dinner when tech discussion started 2008-08-19 15:46 was good 2008-08-19 15:46 good intro 2008-08-19 15:50 flips: I also happen to know one of the guys developing Tracker (http://www.gnome.org/projects/tracker/), in case you want another point of view... 2008-08-19 15:51 it he interfacing to strigi? 2008-08-19 15:51 no 2008-08-19 15:51 because? 2008-08-19 15:51 I'd be interested in why 2008-08-19 15:51 because it's like strigi but started by the gnome people 2008-08-19 15:52 :-) 2008-08-19 15:52 gnome guys usually get everything wrong 2008-08-19 15:52 and IIRC they have their own indexing engine (strigi uses clucene) 2008-08-19 15:52 I have all this marginally useless gnome invention cruft on my system 2008-08-19 15:52 some gnome thing was screwing up this morning and I'm not running gnome 2008-08-19 15:53 but I'm interested in the reasoning in the fascination with the bizarre sense 2008-08-19 15:53 :-D 2008-08-19 15:54 linus holds similar views btw 2008-08-19 15:57 I know 2008-08-19 15:57 found a bug in btree probe, wow 2008-08-19 15:57 it's been in ddsnap all these years 2008-08-19 15:57 never had to probe for a key that already exists 2008-08-19 15:57 fact is, when I started developing on linux when I was at the university, I used gtk+ (1.1.something, IIRC) 2008-08-19 15:57 that horrified me so much, I quickly moved to qt :-) 2008-08-19 15:58 I know, I used to check the gtk web page every day to see all the amazing new ideas 2008-08-19 15:58 loved glade 2008-08-19 15:58 glade was good, yeah 2008-08-19 15:58 that was before I learned about oop and the lack of it in gnome thinking 2008-08-19 15:58 gtk is one of the things that really hurts linux desktop adoption now 2008-08-19 15:59 particularly its use in moz firefox 2008-08-19 15:59 but corba, bonobo, gnomevfs (lack of use in gnome applications, actually), etc screwed things royally 2008-08-19 15:59 really 2008-08-19 15:59 clusterfsck 2008-08-19 15:59 reinvention of KIO as GIO is the last great idea from gtk people 2008-08-19 15:59 serial braindamage 2008-08-19 16:00 dbus is a mess 2008-08-19 16:00 dcop was so nice 2008-08-19 16:00 seemed to work 2008-08-19 16:00 now uses dbus, right? 2008-08-19 16:00 dbus has serious borkness 2008-08-19 16:00 something invented in a day after getting beer drunk must really work well :-) 2008-08-19 16:00 yes, now dbus 2008-08-19 16:00 I remember 2008-08-19 16:00 used X ICE 2008-08-19 16:01 :-) 2008-08-19 16:01 there is also some good stuff in dbus 2008-08-19 16:01 it's not the worst thing the gnome mafia came up with 2008-08-19 16:02 pop to level 1, 3 of 1 nodes 2008-08-19 16:02 [5908] devmap_blockio: read block c0de00000007 2008-08-19 16:02 [5908] devmap_blockio: Failed assertion "dev->bits >= 9 && dev->fd" 2008-08-19 16:02 whoops 2008-08-19 16:02 hmm 2008-08-19 16:03 shapor, getting close to sk8 o'clock? 2008-08-19 16:46 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-19 17:29 -!- boom(~boom@c-76-117-208-224.hsd1.nj.comcast.net) has joined #tux3 2008-08-20 00:05 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-20 01:17 hey flips 2008-08-20 01:18 oh everybody is asleep now :) 2008-08-20 01:18 ACTION is awake 2008-08-20 01:21 yeah, it was nice meeting you the other day, I hope you had fun with us 2008-08-20 01:23 yeah, it was fun 2008-08-20 01:24 tried to absorb some of your scheduler speak, i don't know much about it 2008-08-20 01:30 there's a lot of activity trying to cross lock the rq so that you can move tasks across to another processor 2008-08-20 01:31 depending on the run category, FIFO or OTHER, you can migrate it directly or use a migration thread to facilitate the move 2008-08-20 01:31 problem with -rt is that the FIFO detection code can be very aggressive about scanning other run queues which can be a bit unbounded 2008-08-20 01:32 that can hold the spinlock protecting for a long time and can cause other kinds of contention if a lot of migration operations are happening 2008-08-20 01:32 if the algorithm is polynomial time this might cause severe contention on those locks 2008-08-20 01:32 have you folks thought about how to do online disk checking yet ? 2008-08-20 01:35 online fsck? 2008-08-20 01:36 yes 2008-08-20 01:37 have you folks thought about extending the page cache to do more sophisticated things like explicit buffer tracking ? 2008-08-20 01:38 i haven't, i believe flips has mumbled about it 2008-08-20 01:39 yeah, I wonder what i know or not is patented already 2008-08-20 02:14 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-20 07:01 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-20 13:21 the new iterative btree dump is way prettier than the recursive one and shorter 2008-08-20 13:54 shapor, next_key worked perfectly on the first try 2008-08-20 14:51 "Having actually run ZFS in production, there are some serious drawbacks with the remaining features (copy-on-write fragmentation, problems in SAN environments, etc), that may leave one wishing they'd implemented the ZFS features in a more stackable way so you could easily discard inappropriate layers and features" -- znork 2008-08-20 15:27 "You've run ZFS in production, yet you can't see the improvement on Linux's model? You mean the fact that md is completely broken and LVM is unreliable and slow by comparison?" -- outZider 2008-08-20 15:27 "Sir, I wish I had points to mod this up!" -- doomicon 2008-08-20 15:28 "ZFS is really, really nice but it does have some warts and the biggest for many would be that arcane operating system that's dangling off its nutsack" -- Kent Recal 2008-08-20 15:31 lol 2008-08-20 15:31 "Maybe he wants a cluster file system, or one that does HSM. I know I do and ZFS as of today does neither. ZFS is designed for managing a bunch of direct attached hard disks in thumper or similar device. At anything else it is frankly a bit sucky." jabuzz 2008-08-20 16:13 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-20 23:46 I need a clever 16 bit hex magic number for the volume tree 2008-08-20 23:47 existing magics are 0x1eaf for file index leaves and 0x90de for inode table leaves 2008-08-20 23:47 could be 32 bits, since this table is special and size doesn't matter 2008-08-20 23:50 for now the magic is 0x2008 2008-08-21 01:27 hey 2008-08-21 01:28 flips: cow allocation is pretty important 2008-08-21 01:28 to prevent fragmentation 2008-08-21 01:30 really 2008-08-21 01:30 so there are some concepts being considered 2008-08-21 01:30 delayed writes help I guess 2008-08-21 01:30 they should 2008-08-21 01:30 ACTION is tired today 2008-08-21 01:30 there is a concept of generating function driven goals 2008-08-21 01:31 been up since about 12pm and have been in front of a computer for most of that time 2008-08-21 01:31 12 pm... 25 hours ago? 2008-08-21 01:31 or only 13? 2008-08-21 01:32 flips: 12pm != midnight ;) 2008-08-21 01:32 12am is 25+ hours ago 2008-08-21 01:32 midday 2008-08-21 01:32 anyway, the idea is when data writes do collide with other versions of the data, either for atomic commit reasons or because a snapshot is held, then the write gets bounced away to a new goal, and if it collides there, to a further away goal 2008-08-21 01:32 the thing is, related data should get bounced to a similar place 2008-08-21 01:33 how do you choose where? 2008-08-21 01:34 another concept is to avoid completely filling any given region, which would interfere with placing the small amount of metadata in the region that is needed to do a certain atomic commits 2008-08-21 01:34 first bounce to a little higher, them more higher, and even more, then try a little lower, then bounce way far away 2008-08-21 01:34 generating function decides 2008-08-21 01:35 like a quadratic hash 2008-08-21 01:35 if you can keep say 4MB globs of data together, it doesn't matter much that it is stored far from its inode 2008-08-21 01:36 the seek time ends up about 10% of the transfer time, which is ok 2008-08-21 01:36 it's when you have lots of itty bitty pieces scattered around that seeking gets dominant 2008-08-21 01:36 there is also a concept of keeping an allocation density per region 2008-08-21 01:37 say, per 128 MB region 2008-08-21 01:37 so the bounce function could take that into account 2008-08-21 01:39 rewrite of a 1 GB file is not necessarily as scary as it sounds, if the truncate is committed first and synced, the the old blocks can be freed and rewritten 2008-08-21 01:39 if snapshotted, you want to take a huge bounce far away 2008-08-21 01:39 so the bounce function needs to take the size of the file into account 2008-08-21 01:39 bigger file = bigger bounces 2008-08-21 01:39 ACTION has a massive headache 2008-08-21 01:40 ACTION recommends that people with headaches not think about impossible problems too much 2008-08-21 01:40 just get flash storage 2008-08-21 01:40 right 2008-08-21 01:40 so far, no actual coding has gone into allocation strategy 2008-08-21 01:41 i'd say relying on seekless storage would be a good first cut 2008-08-21 01:41 anyway, I now need to think of a name other than "btree" for the on disk representation of a btree root 2008-08-21 01:41 well that's happening by default 2008-08-21 01:42 but I want at least some minimal allocation policy right from the start 2008-08-21 01:42 based on inode number 2008-08-21 01:43 roughly speaking, the idea is to allocate inodes in clumps, the clumps all belonging to files created in the same directory 2008-08-21 01:43 and the clumps scattered fairly far apart 2008-08-21 01:43 whys is that significant? 2008-08-21 01:43 to make a tar benchmark go fast? 2008-08-21 01:43 the file data will then be targetted to the region of that clump of inodes 2008-08-21 01:43 tar is a big deal 2008-08-21 01:44 but in general, inodes should be near their directories and data block should be near the inodes 2008-08-21 01:44 would be better to put data in a place where the head will be when you are most likely to need it 2008-08-21 01:44 because the patter goes: look up dirent; open inode; read data 2008-08-21 01:44 i like the idea of spraying the drive with data if its idle 2008-08-21 01:44 files in the same directory tend to have some relationship to each other 2008-08-21 01:45 why choose when you can write it in more than one place 2008-08-21 01:45 that is kind of what hammer does 2008-08-21 01:45 oh? 2008-08-21 01:45 what does hammer do? 2008-08-21 01:46 it sprays writes into roughly the region it thinks they should go, then the reblocking process comes along later and arranges things tidily 2008-08-21 01:46 also, this is the only way space is freed in hammer 2008-08-21 01:46 free blocks are 128 MB I think 2008-08-21 01:46 i'm talking about writing the same data more than once 2008-08-21 01:46 which are obtained by compacting via reblocking 2008-08-21 01:46 that takes more time 2008-08-21 01:47 not if the drive is idle 2008-08-21 01:47 what if it isn't? 2008-08-21 01:47 takes less time to move the data later 2008-08-21 01:47 since you dont have to copy it, just erase an extra copy 2008-08-21 01:48 there might be something there 2008-08-21 01:48 you don't really want the disk spinning for minutes after a big episode of writes though 2008-08-21 01:48 if you have io's waiting, you obviously dont do it 2008-08-21 01:49 although if you have buffer laying around in ram, and the drive gets idle, why not write them somemore 2008-08-21 01:49 then you can trivially break it just by having a long running write, like untarring dozens of kernel trees 2008-08-21 01:49 true, the last point 2008-08-21 01:49 the drive should never be idle when there is a dirty buffer in cache 2008-08-21 01:49 big flaw in linux there 2008-08-21 01:49 or even clean! 2008-08-21 01:49 :P 2008-08-21 01:50 well 2008-08-21 01:50 not so sure that writing out clean data is a win 2008-08-21 01:50 if you know it should be migrated, sure it might be a good time to migrate 2008-08-21 01:50 opportunistic defrag 2008-08-21 01:50 but, that will be slow 2008-08-21 01:50 because? 2008-08-21 01:51 if it's in cache its just a write 2008-08-21 01:51 because you have to seek to do it 2008-08-21 01:51 ah in cache 2008-08-21 01:51 true 2008-08-21 01:51 yeah... writing clean data is kind of a crazy idea 2008-08-21 01:51 the thing is, most of the badly fragmented stuff won't be in cache 2008-08-21 01:51 but there still might be a slight win 2008-08-21 01:52 there is a something similar planned 2008-08-21 01:52 you could do it if you are reading 2008-08-21 01:52 that is the so called log rollup 2008-08-21 01:52 say you read a heavily fragmented file 2008-08-21 01:52 right 2008-08-21 01:52 it ends up in buffers 2008-08-21 01:52 good point 2008-08-21 01:52 you should neve pay that high price again 2008-08-21 01:52 paint it down in some free space 2008-08-21 01:52 and update metadata 2008-08-21 01:52 you choose a new allocation goal for the whole file, then take the opportunity to migrate it, since you had to read it anyway 2008-08-21 01:52 yes 2008-08-21 01:53 however 2008-08-21 01:53 there better be some write activity going on at the same time 2008-08-21 01:53 people don't really like when writing happens when you are just reading 2008-08-21 01:53 like atime 2008-08-21 01:53 course maybe nobody will notice 2008-08-21 01:54 no one will care 2008-08-21 01:54 fragmentation is the biggest problem with cow style filesystems 2008-08-21 01:54 almost* 2008-08-21 01:54 it should be an advantage that tux3 will not rewrite nearly as much metadata 2008-08-21 01:54 i'm reading a bit about it and i like the hammer approach 2008-08-21 01:54 I like hammer too 2008-08-21 01:55 I want to get on lkml and advocate somebody start porting 2008-08-21 01:55 nice and simple really 2008-08-21 01:55 for what it does, yes 2008-08-21 01:55 see, we got a new file checked in 2008-08-21 01:55 volume.c 2008-08-21 01:56 i'm not sold on always trying to defragment files though 2008-08-21 01:56 that will suck on a lot of workloads 2008-08-21 01:56 tomorrow I will try and actually have it reference the master inode table 2008-08-21 01:56 like a log file server 2008-08-21 01:56 the allocator has to try hard to lay down the data in a reasonable place on the first try 2008-08-21 01:57 would be nice if there was some userspace interface to opportunistic readahead 2008-08-21 01:57 say you have a log file server which is appead mostly 2008-08-21 01:58 then you want to grep all the logs for something 2008-08-21 01:58 drive seeks because files are all badly fragmented 2008-08-21 01:59 there is 2008-08-21 01:59 fadvise 2008-08-21 01:59 no that doesn't help 2008-08-21 01:59 I recall you explaining this before 2008-08-21 01:59 need to explain again ;-) 2008-08-21 02:00 i want to sweep the drive once and grep all the files i read 2008-08-21 02:00 idealy ;) 2008-08-21 02:00 I think I can handle the append slowly case 2008-08-21 02:00 a heuristic is triggered when the log file grows to a certain size and is opened for append 2008-08-21 02:01 then, the file will grow in chunks 2008-08-21 02:01 hm 2008-08-21 02:01 "big log file" trigger? 2008-08-21 02:01 hm 2008-08-21 02:01 the allocation goal function will choose a location to target the next chunk where there exists a fair amount of empty space 2008-08-21 02:01 and other things will be discouraged from squatting there 2008-08-21 02:01 could just profile access patterns in general 2008-08-21 02:01 could 2008-08-21 02:01 maybe should 2008-08-21 02:02 and just store that in ram 2008-08-21 02:02 but some important ones can be determined without much analysis 2008-08-21 02:02 doesn't need to be persistent 2008-08-21 02:02 unless the drive is idle of course ;) 2008-08-21 02:02 wow, zumstor built and passed tests with the mem monitor excised ;-) 2008-08-21 02:03 exactly 2.5 hrs 2008-08-21 02:03 true, and analyzing allocation pattern provides work for lazy cpus 2008-08-21 02:04 I think there may be some allocation "zones", for example, zones where 4 MB is the minimum allocation unit 2008-08-21 02:05 could profile directories too 2008-08-21 02:05 and no more than a single file is allowed in the same 4MB zone 2008-08-21 02:05 4MB chunk I mean 2008-08-21 02:05 directory x usually gets files that dont grow beyond 16kb 2008-08-21 02:05 right 2008-08-21 02:05 while directory y usually gets files that grow to 10gb 2008-08-21 02:05 and then they will be targetted to a small granularity zone 2008-08-21 02:05 yeah 2008-08-21 02:05 and new inode table blocks may be created in that zone too 2008-08-21 02:06 that is the beauty of variable attributes 2008-08-21 02:06 eventually, the original inode table blocks of a directory that was "mispredicted" might be moved to the new, more appropriate zone 2008-08-21 02:06 can just add more on the fly even 2008-08-21 02:06 yes 2008-08-21 02:07 or disable them altogether on flash 2008-08-21 02:07 there is also a concept of inode numbers "folding" over the volume 2008-08-21 02:07 so that two inode numbers very far apart can have allocation goals into the same physical region 2008-08-21 02:07 why do inode numbers matter 2008-08-21 02:08 the inode number determines the physical allocation goal 2008-08-21 02:08 the initial goal anyway 2008-08-21 02:08 so you set the allocation goal for a given file by choosing the inode number 2008-08-21 02:09 so that is saying your primary goal is to place it close to other files in the same directory? 2008-08-21 02:09 yes, and place the data near the inode 2008-08-21 02:09 that could be totally wrong 2008-08-21 02:09 example? 2008-08-21 02:09 maildirs 2008-08-21 02:10 directories full of files, one per message in your mailbox 2008-08-21 02:11 usually just add new files one at a time 2008-08-21 02:11 why is it wrong to place the data near the inode then? 2008-08-21 02:11 read them 1 or 2 at a time, never access them again 2008-08-21 02:11 that is, the file data near the file inode 2008-08-21 02:11 hm there must be a better.. er worse case 2008-08-21 02:11 what if you search your mailbox? 2008-08-21 02:12 "don't do that" ? 2008-08-21 02:12 depends on how brain dead the mail server software is 2008-08-21 02:12 most are pretty brain dead 2008-08-21 02:13 some keep a keywords index db file because search is slow 2008-08-21 02:13 grep * with 20000 files is slow 2008-08-21 02:13 although if you do need to do that, it woud be nice not to see 2008-08-21 02:13 seek* 2008-08-21 02:14 now that would be a cool system call 2008-08-21 02:14 "search these files for this pattern" 2008-08-21 02:14 and please dont seek 2008-08-21 02:15 for that kind of grep you want to ls -U | grep foo 2008-08-21 02:15 err 2008-08-21 02:15 well like that 2008-08-21 02:15 |xargs 2008-08-21 02:15 right 2008-08-21 02:15 hrm never thought of that 2008-08-21 02:15 smrt 2008-08-21 02:16 htree will then provide the entries in hash order 2008-08-21 02:16 no better than lexical order 2008-08-21 02:16 hm 2008-08-21 02:16 but phtree will provide them in physical order 2008-08-21 02:16 things will sing 2008-08-21 02:17 that sounds really sucky of htree 2008-08-21 02:17 btrfs guys are busy inplementing the htree idea 2008-08-21 02:17 htree is very fast as most things 2008-08-21 02:17 but it's not the best solution 2008-08-21 02:17 imho 2008-08-21 02:18 htree is really good for huge volumes when nothing is in cache 2008-08-21 02:18 with the caveat that the above load will still suck 2008-08-21 02:18 no matter what allocation strategy is used the key will be benchmarking common worklodas 2008-08-21 02:18 and making sure they sing 2008-08-21 02:18 right 2008-08-21 02:18 untarring kernel trees is one of the important ones 2008-08-21 02:18 and also trying to tickle worst cases 2008-08-21 02:18 then grep the kernel tree, stuff like taht 2008-08-21 02:20 would be neat to hint allocation strategy with ioctls or something 2008-08-21 02:20 similar idea to fadvise 2008-08-21 02:20 "this is a log file" 2008-08-21 02:20 or "this file will never be more than 8k" 2008-08-21 02:21 struct root { u64 block:48, levels:8, unused:8; }; 2008-08-21 02:21 struct btree { struct root root; u16 entries_per_leaf; }; 2008-08-21 02:21 but if fadvise is any indicator, such an interface would never get used 2008-08-21 02:21 sadly 2008-08-21 02:21 s/never/very rarely/ 2008-08-21 02:22 well we can make a ddlink interface and you can go crazy with hints 2008-08-21 02:22 see what works 2008-08-21 02:22 most important thing though is to act fairly reasonable in common loads 2008-08-21 02:23 yeah because all those great ideas go to shit if you are serving the volume over nfs 2008-08-21 02:23 and every write is sync too, that hurts 2008-08-21 02:23 heh 2008-08-21 02:23 sync has to be fast 2008-08-21 02:23 I think tux3 will have a really fast sync 2008-08-21 02:23 hammer would probably kick all ass as an nfs server 2008-08-21 02:23 because of the forward log thing 2008-08-21 02:24 quite possibly 2008-08-21 02:45 flips: see the mail on the list? 2008-08-21 02:45 ACTION looks 2008-08-21 02:46 so people are reading your messags afterall ;) 2008-08-21 02:46 :-) 2008-08-21 02:47 so hopefully it will be less of a blog in future 2008-08-21 02:47 or at least one that gets lots of comments ;) 2008-08-21 02:51 ok, time to respond 2008-08-21 02:51 just checked in a big splat change 2008-08-21 02:51 need to restructure the way args are passed to the btree methods somewhat 2008-08-21 02:51 so that leaf methods can use fields in the struct btree 2008-08-21 02:52 anyway... microchange 2008-08-21 02:52 but macro patches to do it 2008-08-21 02:53 86 members on tux3 now 2008-08-21 02:53 just passed zumastor a little while ago 2008-08-21 02:53 we need to get to a beanery the day it passes 100 2008-08-21 04:37 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-08-21 06:07 flips: you going to the linux plumbers conf? 2008-08-21 06:29 pgquiles, wasn't planning on it 2008-08-21 11:54 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-21 14:23 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-08-21 14:29 shapor: ping 2008-08-21 14:36 wow- Tux3 finally got booted off the hottest messages list on lkml. hanging in there in the 1/2 life = 1 day list though 2008-08-21 14:58 tim_dimm: pong 2008-08-21 15:00 any response to your bind post? 2008-08-21 15:01 not on the list 2008-08-21 15:01 any privately? 2008-08-21 15:02 some guy from ISC replied, thanked me for the patch and said that it wouldn't compile on all platorms due to "compiler constructs" 2008-08-21 15:02 isn't (struct in_addr){ .s_addr = htonl(hst->ip)} ANSI C? 2008-08-21 15:05 shapor, it is C99 2008-08-21 15:05 ah i've gotten used to c99 i guess 2008-08-21 15:05 rewrite as .s_addr = htonl(hst->ip); 2008-08-21 15:06 obviously 2008-08-21 15:06 yeah 2008-08-21 15:06 boneheads over there sounds like 2008-08-21 15:06 didn't tell you the error message I bet 2008-08-21 15:06 no i had to ask what construct he was talking about 2008-08-21 15:07 they are in the business of intentially producing buggy software 2008-08-21 15:07 I am even more on the leading edge of insanity, write in gnu-c99 2008-08-21 15:07 fancy stuff 2008-08-21 15:07 the only practical difference I have noticed is, g99 has typeof 2008-08-21 15:07 it's beyond me how anybody can get by without it 2008-08-21 17:48 -!- MaZe(~MaZe@216-239-45-4.google.com) has left #tux3 2008-08-21 18:01 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-08-21 20:09 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-08-21 23:30 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-22 00:49 folks 2008-08-22 00:50 flips: or put the metadata delta in a contiguous block run 2008-08-22 00:56 delayed allocation is your best friend, the allocator is going to be a pain in the ass 2008-08-22 01:59 true that 2008-08-22 02:01 bh, I don't get your comment about the metadata delta 2008-08-22 02:02 hey 2008-08-22 02:02 oh maybe I'm being stupid 2008-08-22 02:03 like changes in the b-tree itself might benefit from being contiguous 2008-08-22 02:03 you mean for replication? 2008-08-22 02:04 the principle is simple: changes in metadata do not mean a thing 2008-08-22 02:04 for dealing with changes to the b-tree itself, maybe make a distinction in how metadata is written versus data 2008-08-22 02:04 it is only changes in the logical data that have to be replicated 2008-08-22 02:04 the packing for that is known so it might benefit from a special treatment of that case 2008-08-22 02:04 oh, ok, like that log things you wrote up about ? 2008-08-22 02:05 log thing? 2008-08-22 02:05 that is a way of doing atomic commit 2008-08-22 02:05 yes, it matters 2008-08-22 02:05 because the log can contain part of the logical data 2008-08-22 02:05 if it has not been rolled up into the "real" fs structure yet 2008-08-22 02:25 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-22 03:33 ACTION is about to head to bed 2008-08-22 08:22 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-22 10:25 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-22 10:32 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-22 10:33 -!- konrad(~konrad@c-24-16-77-169.hsd1.wa.comcast.net) has joined #tux3 2008-08-22 11:24 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-22 11:29 ah 2008-08-22 11:30 got through a big jogjam, inode creation is now in a plausible from 2008-08-22 11:30 going to do a call for testing pretty soon 2008-08-22 11:30 ACTION thinks about maze 2008-08-22 11:31 or more properly, a call for reality check on basic algorithms 2008-08-22 11:31 testing/debugging is too easy for the great minds on this channel ;-) 2008-08-22 11:53 ;-) 2008-08-22 12:15 hey, there's a guy who hasn't joined our tux3 LinkedIn List 2008-08-22 12:16 got to engage buttgears 2008-08-22 12:16 http://www.linkedin.com/e/gis/154012 2008-08-22 12:16 buttgears 2008-08-22 12:16 that's a new one to me 2008-08-22 12:17 is that like toosh_drive? 2008-08-22 12:17 geekified version of got to get ass in gear 2008-08-22 12:38 hey 2008-08-22 12:38 hi bh 2008-08-22 22:33 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-08-23 00:48 flips: you there >? 2008-08-23 00:48 I think you need to put compression "blobs" in tux3 from the get go 2008-08-23 00:48 so that you can dedup things 2008-08-23 00:48 hi bh 2008-08-23 00:48 I think it's rather important to do that kind of thing because of the various enterprise uses of that 2008-08-23 00:49 potentially by folks like facebook and stuff. 2008-08-23 00:49 can I translate that as just have everything blob-ready? 2008-08-23 00:49 because it already is, kind of 2008-08-23 00:49 having a kind type file specific metadata for large linearly seeked files is another thing that's minor, but valuable as well 2008-08-23 00:50 you mean, with a radix tree instead of a btree? 2008-08-23 00:50 extents make large linear files quite nice 2008-08-23 00:50 flips: you might like to consider and experiment with various sizes of compression blobs to see how efficient the compression and storage is, then store the sha1 hash to "union" common file segments 2008-08-23 00:50 I think that's really important IMO 2008-08-23 00:51 wait, I'm merging two things into one 2008-08-23 00:51 forget compression, replace that completely with deduping 2008-08-23 00:51 maybe you can then extend that to compresssion as well using the same blob infrastructure 2008-08-23 00:51 how about deduping at the lvm level? 2008-08-23 00:51 no, at the file level 2008-08-23 00:51 why does deduping ahve to involve the filesystem? 2008-08-23 00:52 file fragment level so that things like .jpgs and stuff can save on storage 2008-08-23 00:52 because I can't see how that can be done at the RAID level 2008-08-23 00:53 it's about how a file is represented as a piece of data in the fs 2008-08-23 00:53 that would be valuable for a lot of enterprise ready folks that use those kind of filers, you don't want to just add it on later and do a half as job at it 2008-08-23 00:53 use an extent-based interface to the lvm 2008-08-23 00:54 it is coming inevitably anyway 2008-08-23 00:54 how's that going to solve the problem ? 2008-08-23 00:54 it gives you variable length data 2008-08-23 00:54 well 2008-08-23 00:54 dedupping 2008-08-23 00:54 just not sure how that fits in, you mean extent per file segment or something like that ? 2008-08-23 00:54 I was thinking of the other 2008-08-23 00:54 ACTION is prepping for Burning Man tonight 2008-08-23 00:54 ok, so you want to have the filesystem reference data with finer granularity than blocks? 2008-08-23 00:55 so that you can do real micro-idenfication of similar data? 2008-08-23 00:55 no, large than blocks, with the magic size of a "blob" 2008-08-23 00:55 what is the difference between a blob and an extent/ 2008-08-23 00:56 so that you can do an incremental back up or something like that for, say, a 32k commit/write and have that be backed already by another segment existing on the disk 2008-08-23 00:56 there's a tradeoff between the metadata size and the savings of space. 2008-08-23 00:56 I'm just making up the term "blob" for a compression cluster of blocks 2008-08-23 00:56 contiguous in a file 2008-08-23 00:57 replication is a decent argument 2008-08-23 00:57 because the filesystem should replicate at the filesystem level 2008-08-23 00:57 this is heavy enough that it might benefit from scoping out which parts of the file system do it, say, by volume or specific directory 2008-08-23 00:57 a filesystem like this one anyway 2008-08-23 00:58 flips: I'm trying to communicate this to you because I think it's critically important' 2008-08-23 00:58 you will succeed in getting me thinking about it 2008-08-23 00:58 the fact's a file segment or cluster shouldn't matter since they might be able to use the same generic framework 2008-08-23 00:58 encryption would also fall into this category 2008-08-23 00:58 having multiple pointers pointing at the same blob is currently an alien concept to tux3 2008-08-23 00:59 how do you know when the blob can be released? 2008-08-23 00:59 well, give some thought and maybe you'll think it's important enough to change it 2008-08-23 00:59 not sure, good question 2008-08-23 00:59 I'd imagine it would be similar to the hard link problem 2008-08-23 00:59 maybe there is a good answer 2008-08-23 00:59 ref counting is not nice 2008-08-23 01:00 you have to have all those counts persistent 2008-08-23 01:00 yeah, well, it's better to do stuff like this up front if you decided you need it 2008-08-23 01:01 well I suppose refcounts could be done like extents 2008-08-23 01:01 and you only incur the overhead if using deduping 2008-08-23 01:01 which you expect to go slower I would hope 2008-08-23 01:02 pluggable btree leaf formats as tux3 has now gives you all the blob referencing machinery you need 2008-08-23 01:04 bh: so you're saying file-level checksumming for deduping? 2008-08-23 01:08 or.. a blob would be larger than an extent and smaller than a whole file 2008-08-23 03:29 zfs has dnodes and znodes, I wonder what those are 2008-08-23 03:34 the problem with "de-duping" with extents will be alignment 2008-08-23 03:34 yes, you have to include a header in the blob 2008-08-23 03:34 but alignment is not nearly as serious an issue with extent based filesystem as block based 2008-08-23 03:34 well even if you store it in the metadata 2008-08-23 03:35 if 2 files have a common chunk, it needs to be broken up in to extents the same way in both 2008-08-23 03:35 even if the files are identical 2008-08-23 03:35 the approximate size would be in the metadata, exact size in the blob 2008-08-23 03:35 they could be broken up in to extents different 2008-08-23 03:35 differently* 2008-08-23 03:36 the object is just to identify common blocks? 2008-08-23 03:36 and not arbitrary regions? that can also be done 2008-08-23 03:36 yeah, single instance storage, right? 2008-08-23 03:36 how do you decide where to draw the line? 2008-08-23 03:36 i think that is what bh was suggesting 2008-08-23 03:37 there is a vanishingly small chance it will get into the prototype implementation 2008-08-23 03:38 therefore the object of the exercise must be to see how we could be blob friendly 2008-08-23 03:38 i think the only reasonable way to do it is in the background 2008-08-23 03:38 and sharing between files 2008-08-23 03:38 not take a lot of decisions that would make it hard to do bloby things 2008-08-23 03:38 ugly code 2008-08-23 03:38 if sharing can be supported a background thing could be added on 2008-08-23 03:38 it will be very ugly 2008-08-23 03:39 will it really? 2008-08-23 03:39 another layer of indirection? 2008-08-23 03:39 heh 2008-08-23 03:39 doing it in the lvm would be cleaner, provided an extent based interface is available 2008-08-23 03:39 to the lvm 2008-08-23 03:39 that would be pretty neat 2008-08-23 03:39 we sort of have something like that already, namely bio 2008-08-23 03:40 work with all filesystems 2008-08-23 03:41 and you could only enable it on the "slow" data device 2008-08-23 03:41 not the fast metadata one 2008-08-23 03:41 right 2008-08-23 03:41 I'd like to see a killer argument why it has to be done in the filesystem 2008-08-23 03:41 can you change the bio size? 2008-08-23 03:42 yes 2008-08-23 03:42 especially with my stacking patch 2008-08-23 03:42 so the device will only accept 4k bios 2008-08-23 03:42 bio size can even be changed on the fly 2008-08-23 03:42 while the bio is in flight 2008-08-23 03:42 bios are pretty loosely goosey 2008-08-23 03:54 shapor, a problem with your ownership inheritence suggestion: a file does not know what directory it is in. 2008-08-23 03:54 so can't inherit anything from the directory 2008-08-23 03:54 it can however inherit from its inode table block 2008-08-23 03:54 which gives the same effect 2008-08-23 03:54 hrm 2008-08-23 03:56 "Shapor has suggested that there be per-directory default uid, gid and 2008-08-23 03:56 mode attributes, so any file with exactly those attrbutes does not have 2008-08-23 03:56 to represent ownership at all, but inherits it from the inode table 2008-08-23 03:56 block it lives in. Allocation policy will be such that its neighbours 2008-08-23 03:56 are likely to have inentical ownership." 2008-08-23 03:57 permissions also 2008-08-23 03:57 ah "mode attributes" 2008-08-23 03:57 glazed over that 2008-08-23 03:57 right 2008-08-23 03:58 the idea could be extended to acls with some effort 2008-08-23 04:09 sleepy time 2008-08-23 04:19 had to play the new star wars ps3 demo, it's awesome 2008-08-23 04:19 advances the state of the art of that kind of game 2008-08-23 04:19 unlike the recent movies 2008-08-23 04:53 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-23 12:19 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-23 13:01 so how big will the tux3 superblock be 2008-08-23 13:01 traditional 4k? 2008-08-23 13:01 smaller I think 2008-08-23 13:02 maybe 1K or 512 bytes 2008-08-23 13:02 a tux3 filesystem will never be smaller than 4K, but not the entire 4K needs to be valid superblock data 2008-08-23 13:02 so it can be probed by reading 4K to find the superblock magic and block size 2008-08-23 13:02 does variable size make sense? 2008-08-23 13:03 variable size superblock? would would you do with the extra space on large block size? 2008-08-23 13:03 I think, just let it be 2008-08-23 13:03 text description of the filesystem? heh i dunno 2008-08-23 13:03 bitmaps of the devs 2008-08-23 13:04 pi to as many digits that will fit? 2008-08-23 13:04 the bigger the block size the more detailed pix you have of the devs 2008-08-23 13:04 hah, we dont want to scare people 2008-08-23 13:04 no, especially not topless rollerskating pictures of the devs 2008-08-23 13:04 at any resolution 2008-08-23 13:04 yikes 2008-08-23 13:04 speaking of which... 2008-08-23 13:05 isn't there something happening in venice today? 2008-08-23 13:06 starts at noon 2008-08-23 13:06 time to get a coffee in me and get skates on 2008-08-23 13:28 hey flips 2008-08-23 13:28 ACTION fell asleep last night fairly suddenly 2008-08-23 13:30 flips: shapor yeah, that's kind of what I meant, other than, maybe a mechanism that can dedup it at the userspace level 2008-08-23 13:30 maybe it's not the role of the FS to do that, just a suggestion 2008-08-23 13:30 but it shouldn't complicate the prototype imo because getting it out is more important 2008-08-23 13:33 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-23 13:38 bh, see if you can find a killer argument why the fs has to do it 2008-08-23 13:38 as opposed to the lvm 2008-08-23 13:38 assuming an extent-aware lvm 2008-08-23 13:39 (note for lvm3 design) 2008-08-23 14:57 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-23 14:58 hey tim_dimm 2008-08-23 14:58 hey 2008-08-23 14:58 coding away I would assume 2008-08-23 14:58 yes 2008-08-23 14:58 I'll roll out for a skate sometime 2008-08-23 14:58 I'm doing house duty today 2008-08-23 15:17 btree.c:451: error: 'typeof' applied to a bit-field <- lame :p 2008-08-23 15:17 lazy gcc devs 2008-08-23 15:17 lazy & ugly 2008-08-23 15:46 g99 is too advanced 2008-08-23 16:29 [11882] tuxread: read 0/c 2008-08-23 16:29 got 12 bytes 2008-08-23 16:29 0xbf93990c: 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 "hello world!" 2008-08-23 16:47 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-23 16:49 flips: all tests compile (without warnings) and run on 64 bit 2008-08-23 16:51 :-) 2008-08-23 16:51 I tried to fill in the (L)s where needed 2008-08-23 16:53 sk8 oclock 2008-08-23 17:28 ah, getting close anyway 2008-08-23 17:28 now should I skate first or have another coffee first 2008-08-23 17:28 leaning towards the latter 2008-08-23 17:28 makes the skate more interesting 2008-08-23 17:30 make it an irish coffee 2008-08-23 17:30 and see how far you can make it before you fall 2008-08-23 17:30 not without moral support 2008-08-23 17:30 get joelle down for one ;-) 2008-08-23 17:31 you married folks are too boring 2008-08-23 17:32 oh, I misspoke in my latest tux3 post 2008-08-23 17:32 hah, not quite 2008-08-23 17:32 I can make a 1EB file no problem, even if bitmaps can't map 1EB 2008-08-23 17:33 the block doesn't have to be high up at all 2008-08-23 17:33 but I will put it up high anyway 2008-08-23 17:33 1EB/2 - 1 2008-08-23 17:33 just set the allocation goal at 1EB/2 - 100 or so 2008-08-23 17:34 the whole fs will be allocated way up there 2008-08-23 17:34 possibly turning up some unhandled boundary conditions ;-) 2008-08-23 17:36 why? 2008-08-23 17:37 because it happens 2008-08-23 17:37 on occasion 2008-08-23 17:37 wait 2008-08-23 17:37 most of the big changes I made over the last couple weeks turned up something unhandled 2008-08-23 17:37 how are you going to allocate blocks at 1eb/2 ? 2008-08-23 17:37 you have a device that big? 2008-08-23 17:37 or going to use sparse file as your device? 2008-08-23 17:37 use a sparse file for the volume 2008-08-23 17:38 well 2008-08-23 17:38 what fs supports that big a spare file? 2008-08-23 17:38 you're right 2008-08-23 17:38 can't be quite that high 2008-08-23 17:38 16 TB 2008-08-23 17:38 pretty far off ;) 2008-08-23 17:38 your 64 bit system should be able to do 1 EB / 2 2008-08-23 17:38 on ext3? 2008-08-23 17:38 I'lll make it dependent on sizeof(int) so that gets tested 2008-08-23 17:38 yes 2008-08-23 17:39 let me see 2008-08-23 17:39 or now 2008-08-23 17:39 :) 2008-08-23 17:39 or no 2008-08-23 17:39 yeah i'm checking 2008-08-23 17:39 ext4 2008-08-23 17:39 ext4? 2008-08-23 17:39 good chance to eval ext4 2008-08-23 17:39 ext4 of course 2008-08-23 17:39 doubles the size of all pointers etc 2008-08-23 17:42 gah, I have written tuxopen in the wrong order 2008-08-23 17:42 need to create the inode before creating the dirent 2008-08-23 17:42 blush 2008-08-23 17:43 dd: truncating at 1125899906842624 bytes in output file `t': File too large 2008-08-23 17:43 on ext3 2008-08-23 17:43 want to try 16tb - 1 while you're at it? 2008-08-23 17:43 then 16TB even? 2008-08-23 17:44 1tb is ok 2008-08-23 17:44 2tb is not 2008-08-23 17:44 1 tb + 1? 2008-08-23 17:45 ah, there is some other lame limitaion in ext3 2008-08-23 17:45 the horrors are slowly coming back to me 2008-08-23 17:45 2tb - 1 seems to be the limit 2008-08-23 17:45 something about signed offsets 2008-08-23 17:46 the highest signed offset 2008-08-23 17:46 which is lame 2008-08-23 17:46 er maybe exactly 2t 2008-08-23 17:46 I would hope exactly 2008-08-23 17:46 since i'm using seek 2008-08-23 17:47 using 64 bit seek I suppose 2008-08-23 17:47 because of 64 bit system 2008-08-23 17:47 hrmwell i'm doing bs=1M 2008-08-23 17:47 skip=2M 2008-08-23 17:47 er seek=2M 2008-08-23 17:47 http://en.wikipedia.org/wiki/Ext3 2008-08-23 17:48 maybe dd is mathing it out 2008-08-23 17:48 Max file size 2 TiB 2008-08-23 17:48 now what is the boneheaded limit 2008-08-23 17:48 shoudl be 16 TB, the size that the page cache can handle 2008-08-23 17:48 ftruncate(1, 2199023255552) = -1 EFBIG (File too large) 2008-08-23 17:49 somebody lied? 2008-08-23 17:49 or do they not mean by limit what they think they mean? 2008-08-23 17:50 Linux yzf.shapor.com 2.6.18-6-amd64 #1 SMP Mon Jun 16 22:30:01 UTC 2008 x86_64 GNU/Linux 2008-08-23 17:51 from man ftruncate 2008-08-23 17:51 EFBIG The argument length is larger than the maximum file size. (XSI) 2008-08-23 17:51 the 2 TB limit comes from the structure of the ufs-style index, maybe 2008-08-23 17:51 oh might have to do with block size 2008-08-23 17:51 it does 2008-08-23 17:51 but ext3 is pretty much always 4K 2008-08-23 17:52 except if you make a fs on a floppy 2008-08-23 17:52 so many it onyl supports 16tb if you make it larger than 4k 2008-08-23 17:52 anyway, Tux3 will exactly hit its limits, not be off by one 2008-08-23 17:53 branching factor is 2^10 for ext2/3 index block 2008-08-23 17:53 Block size: 4096 2008-08-23 17:54 triple indirect is 10 + 10 + 10 + 12 bits 2008-08-23 17:54 42 bits 2008-08-23 17:54 add in some braindamage for signedness, and maybe that is the limit 2008-08-23 17:54 maybe not 2008-08-23 17:55 albert cahahan has a post from a few years back 2008-08-23 17:55 treasts the question accurately 2008-08-23 17:55 hrm ext3 limit wasn't alwaus 16t 2008-08-23 17:55 unlike me at the moment ;-) 2008-08-23 17:55 i see some discussions of people trying to get it to work 2008-08-23 17:55 back in '06 2008-08-23 17:55 and my kernel is pretty old 2008-08-23 17:56 hrm no that was fs size 2008-08-23 17:56 not file size 2008-08-23 17:57 flips: you suck at reading comprehension 2008-08-23 17:57 from wikipedia 2008-08-23 17:57 Max file size 2 TiB 2008-08-23 17:57 http://lwn.net/Articles/91731/ 2008-08-23 17:58 I pasted taht above 2008-08-23 17:58 oh 2008-08-23 17:58 i mean i suck at reading 2008-08-23 17:58 heh 2008-08-23 17:58 comprehension 2008-08-23 17:59 oh right 2008-08-23 17:59 it is about measuring blocks in sectors 2008-08-23 17:59 blah 2008-08-23 17:59 bleah 2008-08-23 17:59 hrm maybe on tmpfs 2008-08-23 17:59 it's ok, I don't need the underlying volume that big 2008-08-23 17:59 well it would be nice to test 2008-08-23 18:00 the handling of the bitmaps 2008-08-23 18:00 maybe xfs? 2008-08-23 18:00 Max file size 8 exabytes 2008-08-23 18:00 hrm nope, tmpfs fail as well 2008-08-23 18:00 what is this - 1 byte bs? 2008-08-23 18:01 cant be signed? 2008-08-23 18:01 that would be ... retarded 2008-08-23 18:01 that's not it 2008-08-23 18:01 it's just less by one byte 2008-08-23 18:01 nonsensicle 2008-08-23 18:01 nonsensical 2008-08-23 18:03 limit on tmpfs also seems to be 2 TB - 1 2008-08-23 18:05 no 2008-08-23 18:05 whoops i was in the wrong dir 2008-08-23 18:07 tmpfs is actually a bit over 256G 2008-08-23 18:07 I wonder what the limit is there 2008-08-23 18:07 swapper most likely 2008-08-23 18:07 what about ramfs? 2008-08-23 18:09 http://lkml.org/lkml/2004/1/30/101 2008-08-23 18:09 something related to total memory size 2008-08-23 18:10 suggested workaround of echo 1 >/proc/sys/vm/overcommit_memory 2008-08-23 18:10 didn't help 2008-08-23 18:12 wow ramfs is the ticket 2008-08-23 18:13 -rw-r--r-- 1 shapor shapor 8.0E 2008-08-23 18:13 t 2008-08-23 18:13 ramfs isn't phased at 8EB even 2008-08-23 18:13 dd runs out of offset first ;) 2008-08-23 18:14 dd: offset too large: cannot truncate to a length of seek=8808038400000 (1048576-byte) blocks 2008-08-23 18:15 I put a shot of kahlua in my coffee just for you 2008-08-23 18:15 don't know where anna stashed the wiskey or would have done it properly 2008-08-23 18:15 she's always one step ahead of me ;-) 2008-08-23 18:16 so we use ramfs for testing? 2008-08-23 18:16 yes 2008-08-23 18:16 good sleuthing 2008-08-23 18:17 ramfs on 64 bit 2008-08-23 18:18 because ramfs on 32 bit maxes out at 2^44 2008-08-23 18:18 16 TB 2008-08-23 18:18 due to the page cache index 2008-08-23 18:19 both for volumes and files 2008-08-23 18:20 did btrfs prototype in userspcae first too? 2008-08-23 18:20 anyway, I will work with the ext3 limit for the first big file test. We can still create a 1EB file in tux3 whatever the physical volume size 2008-08-23 18:20 I doubt it 2008-08-23 18:21 cut & paste of something most likely 2008-08-23 18:21 well 2008-08-23 18:21 don't have any clue 2008-08-23 18:21 tux3 is a cut n paste of ddsnap in part 2008-08-23 18:31 ok, what do we do if during a file create the inode creation and allocation succeeds but the dirent creation fails? 2008-08-23 18:32 probably better consult good old ext2 for guidance 2008-08-23 18:44 roll back? 2008-08-23 18:45 to most recent snapshot? 2008-08-23 18:45 no jsut give back the inode 2008-08-23 18:45 seeing as the changes only happened in a buffer that hasn't been committed yet, that is practical 2008-08-23 18:45 and the right thing to do 2008-08-23 18:45 just invalidate the buffer, done 2008-08-23 18:45 even if it has 2008-08-23 18:45 why does it matter 2008-08-23 18:45 orphan inode 2008-08-23 18:46 invalidating the buffer is right, and really nice 2008-08-23 18:46 it's a true rollback 2008-08-23 18:46 thanks 2008-08-23 18:47 why would dirent creation fail, io error or something? 2008-08-23 18:49 out of memory 2008-08-23 18:49 out of disk space 2008-08-23 18:49 udev fucked up? 2008-08-23 18:50 who knows 2008-08-23 18:50 yes and io error 2008-08-23 18:50 bad sector 2008-08-23 18:51 uncorrectable ecc error they call it these days 2008-08-23 18:51 or a cascading failure resulting from that, or bad cpu memory 2008-08-23 18:51 the chance of not corrupting any data when that happens seems low 2008-08-23 18:52 in every case the right thing is to invalidate the buffer 2008-08-23 18:52 yes 2008-08-23 18:52 ext2/3 is very good about no corrupting disk in cases like that 2008-08-23 18:53 that's partly why it's still our standard fs even with much sexier things about 2008-08-23 18:53 yeah 2008-08-23 18:53 true 2008-08-23 18:53 something tells me ZFS is 5 years away from that 2008-08-23 18:53 at least 2008-08-23 18:54 even with the hyped checksumming 2008-08-23 18:54 doesn't save you from a fs bug 2008-08-23 18:54 or memory errors 2008-08-23 18:54 the two most common causes of corruption 2008-08-23 18:54 which seems infinitely more likely than a non detected io corruption 2008-08-23 18:54 right 2008-08-23 18:55 even when the data center i ran hit 115 degrees ambient with crappy old ide drives 2008-08-23 18:55 we didnt see any of that 2008-08-23 18:55 some drives did die 2008-08-23 18:56 but it is obvious 2008-08-23 18:56 io errors up the ass 2008-08-23 18:56 ecc failures is mostly a rather successfull marketing ploy from vendors 2008-08-23 18:56 or ecc uncaught failures I meant 2008-08-23 18:57 ACTION gone skating 2008-08-23 18:57 reminds me of kmfdm lyrics about their music 2008-08-23 18:57 "its made my machines cause they dont make mistakes" 2008-08-23 18:57 heh 2008-08-23 18:58 however maybe all this hyped checksumming will convince hardware vendors to remove error checking, so we should still support it eventually 2008-08-23 19:00 it's already supported at replication time 2008-08-23 19:00 more support you mean 2008-08-23 19:25 yeah, what if you're not replicating 2008-08-23 19:45 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-23 21:41 shapor, then replicate ;-) 2008-08-23 21:41 another question is, what if you're not willing to put aside the space to keep a snapshot to checksum against 2008-08-23 21:42 answer to that may be: only keep the checksums. But for a snapshot, see? 2008-08-23 21:43 imho, checksuming every read is braindamage unless you have hardware sitting idle 2008-08-23 21:43 in which case you spent too much money on your box, probably 2008-08-23 21:43 now... 2008-08-23 22:31 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-08-23 22:31 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-23 22:31 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-08-24 02:41 88 interested observers on the mailing list 2008-08-24 02:42 needs to translate into more commentary 2008-08-24 04:17 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-24 04:21 flips: is lvm3 extent aware ? 2008-08-24 04:21 lvm4 is 2008-08-24 04:22 um 2008-08-24 04:22 sorry! 2008-08-24 04:22 it is just a block device 2008-08-24 04:22 hey, nice to see that you're awake now 2008-08-24 04:22 that means you throw bio structs at it 2008-08-24 04:22 ACTION just finished clubbing around San Francisco 2008-08-24 04:22 I sped thorugh LA today to get up there 2008-08-24 04:22 here 2008-08-24 04:22 ACTION got back from death race 2000 not too long ago 2008-08-24 04:23 I would have stopped if I had time, beside we saw each other not that long ago 2008-08-24 04:23 what is that ? 2008-08-24 04:23 bios are kinda like extents 2008-08-24 04:23 beyond that, lvm knows nothing about it 2008-08-24 04:23 so does that mean that the sha1 hash can be done a per extent basis in lvm4 ? 2008-08-24 04:23 it has a concept called extents 2008-08-24 04:23 but it isn't extents 2008-08-24 04:23 it is just fixed size multiples of the lvm allocation unit 2008-08-24 04:23 what the purpose of putting that in the raid layer ? 2008-08-24 04:24 there is no lvm4, I mispoke 2008-08-24 04:24 sha1 is a stupid hash to use 2008-08-24 04:24 it is ridiculously expensive to compute 2008-08-24 04:24 we're not doing crypto 2008-08-24 04:25 oh 2008-08-24 04:25 but you were thinking about hashed storage 2008-08-24 04:25 that is, content addressed storage 2008-08-24 04:25 sha1 is still pretty extravagant 2008-08-24 04:26 anyway, why the raid layer vs below it? 2008-08-24 04:26 don't know 2008-08-24 04:26 you have to reassemble your raid bits to do your content hash on them 2008-08-24 04:26 well, what else could be used ? 2008-08-24 04:26 depending on the properties of the hash it might be hard to do otherwise 2008-08-24 04:26 xor 2008-08-24 04:26 check out dx_hack_hash 2008-08-24 04:27 it performs well and is cheap to compute 2008-08-24 04:27 performs => distributes evenly 2008-08-24 04:27 sha1 is better, but not so much better as to be worth the cpu load 2008-08-24 04:28 it is also a much wider hash 2008-08-24 04:28 you can't use a 32 bit hash for this without a collision scheme 2008-08-24 04:30 well, what did you think about getting some kind of generic blob support to universally represetn chnk of data in a file so that you can avoid replicating it ? 2008-08-24 04:30 it would apply to uncompressed as well as compress storage 2008-08-24 04:30 http://en.wikipedia.org/wiki/Content-addressable_storage 2008-08-24 04:30 I wouldn't expect a radi layer to be a aware of that stuff 2008-08-24 04:31 ACTION can't chat much longer 2008-08-24 04:31 kay, I can't either 2008-08-24 04:31 got to consider the sleep thing 2008-08-24 04:32 interesting 2008-08-24 04:32 some kind of cheap hash that results in a protocol exchange between upstream and downstream to see if the blocks are really identical would be useful 2008-08-24 04:33 last bit, think about cdta localit and online disk checking. 2008-08-24 04:33 internet is dying 2008-08-24 04:33 night 2008-08-24 04:33 cdta? 2008-08-24 04:33 ah 2008-08-24 04:33 data locality 2008-08-24 04:33 sure 2008-08-24 04:34 been thinking indeed 2008-08-24 04:34 see the log 2008-08-24 04:34 for performance reasons, they should be considered together 2008-08-24 04:34 how far up ? 2008-08-24 04:34 is anything I say useful ? 2008-08-24 04:35 yes 2008-08-24 04:35 very 2008-08-24 04:35 check it out later 2008-08-24 04:35 shapor and me 2008-08-24 04:35 conclusion is that storing a thing that is kind of like a snapshot but does not have the actual snapshot data, just hashes of it 2008-08-24 04:36 would be useful for checking 2008-08-24 04:36 and efficient 2008-08-24 04:36 vs the stupid braindamage that zfs has popularized 2008-08-24 04:36 anything you say? 2008-08-24 04:36 yes 2008-08-24 04:37 inspired the hash snap idea 2008-08-24 05:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 12:47 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 14:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 14:36 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 15:18 iattr.c decoded an attribute list 2008-08-24 15:18 now an encoder 2008-08-24 15:18 or maybe I should check it in as is 2008-08-24 15:19 make shapor's eyes bleed 2008-08-24 15:21 check it in 2008-08-24 15:22 should be more exciting than just "return 0" 2008-08-24 15:22 it is 2008-08-24 15:22 it's real gouge your eyes out stuff 2008-08-24 15:23 strigi dox suck 2008-08-24 15:23 :-( 2008-08-24 15:24 pgquiles, just put out a call for tech writers 2008-08-24 15:24 flips: problem is tech writers would first need to understand what each class, method, etc do, which is the difficult part of these docs 2008-08-24 15:25 I'm not even sure which classes are internal and which ones are intended for use by applications! 2008-08-24 15:25 g99 -g -Wall iattr.c && ./a.out 2008-08-24 15:25 block = 1234, depth = 1 2008-08-24 15:25 I guess I will make it decode more than one attr before checking in 2008-08-24 15:25 pgquiles, decent tech writers understand that stuff 2008-08-24 15:26 jon corbet for example 2008-08-24 15:26 but he doesn't work for free... always 2008-08-24 15:42 shapor, iattr.c skeleton decoder is in 2008-08-24 15:42 next, skeleton encoder 2008-08-24 15:42 then really use both 2008-08-24 15:44 ACTION considers the wisdom of changing his skate wheels 2008-08-24 15:44 http://www.officemax.com/omax/catalog/sku.jsp?skuId=21607263 2008-08-24 15:44 nifty 2008-08-24 15:48 sounds big 2008-08-24 16:10 getting close to sk8 oclock 2008-08-24 17:54 getting really close to sk8 oclock 2008-08-24 18:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 21:50 ACTION has a relatively solid internet connection now 2008-08-24 22:59 flips: pull from me, some minor fixes 2008-08-24 23:00 hopefully not to iattr.c 2008-08-24 23:00 no 2008-08-24 23:00 kay 2008-08-24 23:00 now how do I do that again 2008-08-24 23:00 forgot to write down te prescription 2008-08-24 23:00 oh 2008-08-24 23:00 it was hard 2008-08-24 23:00 you have a bunch of heads 2008-08-24 23:00 and when I pulled I got them all 2008-08-24 23:01 hg pull static-http://shapor.com/tux3/shapor-tux3 2008-08-24 23:01 was a major pain to get rid of them 2008-08-24 23:01 right 2008-08-24 23:01 hrm 2008-08-24 23:01 but not just that 2008-08-24 23:01 best is to clone, then pull into that, then selectively pull from the local copy 2008-08-24 23:01 i should get rid of the heads first 2008-08-24 23:01 well there are probably better ways 2008-08-24 23:01 hm 2008-08-24 23:01 right 2008-08-24 23:02 make a clean repo to pull from 2008-08-24 23:02 that's what the kernel crowd does 2008-08-24 23:02 I'll peek at the repo online 2008-08-24 23:03 why not set up a cgi? 2008-08-24 23:04 ok i recloned 2008-08-24 23:04 and put one of my changes is (fix dependencies in Makefile) 2008-08-24 23:07 have you tried hg view? 2008-08-24 23:07 no 2008-08-24 23:07 try it :-) 2008-08-24 23:07 I can see your repo is clean right away with it 2008-08-24 23:08 $ hg view 2008-08-24 23:08 /usr/bin/env: wish: No such file or directory 2008-08-24 23:08 heh 2008-08-24 23:08 make it there 2008-08-24 23:13 ey 2008-08-24 23:14 I've got a more solid internet connection right now 2008-08-24 23:14 so I can talk for a bit before it drops out 2008-08-24 23:14 ACTION is still prepping for Burning Man 2008-08-24 23:16 shapor, merged 2008-08-24 23:16 not painful 2008-08-24 23:16 only shakey spot was forgetting the url 2008-08-24 23:16 which I have written down this time 2008-08-24 23:16 bh, hi 2008-08-24 23:17 I'm not clear on why prepping is required 2008-08-24 23:17 probably I just don't understand 2008-08-24 23:19 flips: should inode test be asserting ? 2008-08-24 23:19 no 2008-08-24 23:19 (just pulled) 2008-08-24 23:19 perhaps 64 bit bug 2008-08-24 23:19 i'll look 2008-08-24 23:20 outputs about 15 lines then [5972] brelse: Failed assertion "buffer->count" 2008-08-24 23:20 doesn't assert for me 2008-08-24 23:20 double free 2008-08-24 23:21 included the filename? 2008-08-24 23:21 should clean that up 2008-08-24 23:21 yes i included the filename 2008-08-24 23:21 valgrind says uninitialized value 2008-08-24 23:21 in tuxopen 2008-08-24 23:22 let me check here 2008-08-24 23:23 yup 2008-08-24 23:23 just a sec 2008-08-24 23:25 http://pastebin.com/m6e86586e 2008-08-24 23:26 fixed 2008-08-24 23:26 you beat me to it? 2008-08-24 23:26 no that is the output 2008-08-24 23:26 valgrind really blew up 2008-08-24 23:26 illegal opcode 2008-08-24 23:26 i havne't seen that before 2008-08-24 23:27 I inherited that dodgy interface from ripping the ext2 dir code 2008-08-24 23:27 fragile 2008-08-24 23:27 sure blame someone else :P 2008-08-24 23:28 well I'm the one who didn't run valgrind 2008-08-24 23:28 the encode/decode verges on pretty now, does it not? 2008-08-24 23:28 its pretty obvious you aren't using the makefile 2008-08-24 23:28 I use make 2008-08-24 23:29 fairly often 2008-08-24 23:29 not for running the test 2008-08-24 23:29 but not in development, usually just before or after a commit 2008-08-24 23:29 right 2008-08-24 23:29 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-24 23:29 hi tim 2008-08-24 23:29 hey shapor 2008-08-24 23:29 hiyah tim 2008-08-24 23:29 u missed a great skate today 2008-08-24 23:30 flips! 2008-08-24 23:30 dude! 2008-08-24 23:30 I had a pretty good one 2008-08-24 23:30 ;-_ 2008-08-24 23:30 doing a little faux grinding 2008-08-24 23:30 skating on the skateboard obstacles 2008-08-24 23:30 really? 2008-08-24 23:30 wow 2008-08-24 23:30 tim_dimm: did you get my email? 2008-08-24 23:30 about chris? 2008-08-24 23:30 yeah 2008-08-24 23:31 yeah, were you involved in that? 2008-08-24 23:31 we went to ikea tonight, so I didn't have time to respond 2008-08-24 23:31 ah, sounds... manly 2008-08-24 23:31 I also had to clean 40 mini bearings for my race 2008-08-24 23:31 that course looks amazing 2008-08-24 23:31 I'm crashing guys. just logged on to plug in my phone 2008-08-24 23:32 he sent a pic of the road rash he got longboarding downhill 2008-08-24 23:32 night 2008-08-24 23:32 night 2008-08-24 23:32 wanna see the road rash in the am 2008-08-24 23:32 http://tinyurl.com/6ql2h6 2008-08-24 23:32 tim_dimm: g'night 2008-08-24 23:32 I have a special treatment for road rash 2008-08-24 23:33 das ugly 2008-08-24 23:33 wiskey? 2008-08-24 23:33 hip crash 2008-08-24 23:33 no, trying to remember the name 2008-08-24 23:33 special bandages 2008-08-24 23:33 its late, I'll remember in the am 2008-08-24 23:33 k, crashin 2008-08-24 23:33 anna put mine away 2008-08-24 23:33 see you 2008-08-24 23:33 ttyl 2008-08-24 23:33 tegaderm 2008-08-24 23:40 flips: what is this {en,de}code_{two,four,six,eight} mess? 2008-08-24 23:40 serial encoding/decoding 2008-08-24 23:40 always looks like that or worse 2008-08-24 23:42 maybe a macro could make it cleaner? 2008-08-24 23:42 would make it worse 2008-08-24 23:42 try it 2008-08-24 23:43 make once the dust settles 2008-08-24 23:43 inlines are to be preferred over macros 2008-08-24 23:43 yes good attitude 2008-08-24 23:43 also read some similar code 2008-08-24 23:43 s/make/maybe/ 2008-08-24 23:43 say, xdelta 2008-08-24 23:43 then come back and complain ;-) 2008-08-24 23:44 heh 2008-08-24 23:44 serial coding/decoding never looks pretty because the compiler can help very little 2008-08-24 23:44 the endian conversion is the biggest mess 2008-08-24 23:44 if you can make that pretty, show me 2008-08-24 23:45 btw i fixed some (L) warnings 2008-08-24 23:45 if you want to pull 2008-08-24 23:45 ok 2008-08-24 23:47 why do we care about endianness? 2008-08-24 23:48 on disk format should be whatever is native 2008-08-24 23:48 because if somebody writes a filesystem on a ppc they want to be able to read it on an x86 2008-08-24 23:49 right, just record that in the superblock 2008-08-24 23:50 theres no reason do jerk around with the endianness in the normal case if you never swap dicks 2008-08-24 23:50 disks* 2008-08-24 23:50 all filesystems do endian conversion 2008-08-24 23:51 zfs will store in native format and convert if you read on the other format, that sucks the worst 2008-08-24 23:51 why? 2008-08-24 23:51 that seems right 2008-08-24 23:51 because you have a whole different code patch if somebody goes to a different endian host 2008-08-24 23:52 better to pick a format and stick with it 2008-08-24 23:52 that's why there is a network byte order for example 2008-08-24 23:52 sounds ideal, its a uncommon case 2008-08-24 23:52 could do the same wanking with context senstive conversion there, it just isn't wise 2008-08-24 23:52 everyone has pcs 2008-08-24 23:53 no filesystem will get merged in linux without conversion 2008-08-24 23:53 extept for ramfs/tmpfs 2008-08-24 23:53 and tux3 2008-08-24 23:53 :P 2008-08-24 23:53 welcome to the filesystem world 2008-08-24 23:54 there are some unpretty things that have to be done 2008-08-24 23:54 thats just dumb 2008-08-24 23:54 for no good reason 2008-08-24 23:54 not having a consistent disk format would be way dumber 2008-08-24 23:55 think about it: you end up with all the same code if you do context conversion anyway, plus other code 2008-08-24 23:55 oh wait 2008-08-24 23:55 yeah but its not normally in the code path 2008-08-24 23:55 chances are you end up with two copies of the conversion code 2008-08-24 23:55 it's cruft 2008-08-24 23:55 cpu cost of always converting is small 2008-08-24 23:55 there are dedicated processor instructions that do it 2008-08-24 23:56 ACTION is done grumbling about it being a waste 2008-08-24 23:57 ddsnap doesn't do anything with endianness iirc 2008-08-24 23:59 that has to be fixed before merging 2008-08-24 23:59 it's written in the comments at the top of the file 2008-08-25 00:00 big job, nobody wants to do it, its ugly 2008-08-25 00:02 ok zfs goes a bit too far they suport little and big both in the same filesystem 2008-08-25 00:02 thats just stupid 2008-08-25 00:02 Every data structure in ZFS is written in the byte order of the machine writing it, along with a flag to indicate what byte order was used. A ZFS volume on an Opteron machine will be little-endian; one controlled by an UltraSPARC will be big-endian. If you swap the disk between the two machines, it still will workand the more you write to it, the more it will become optimized for native reading. 2008-08-25 00:03 i was just saying put a little/big flag in the sb 2008-08-25 00:04 but yes, then you need double the conversion code 2008-08-25 00:04 bfd 2008-08-25 00:04 merge collision 2008-08-25 00:04 in iattr.c 2008-08-25 00:05 oops 2008-08-25 00:09 merged 2008-08-25 00:09 was worth it just to see how merge conflicts go in mercurial 2008-08-25 00:09 the answer is: smooth and obvious 2008-08-25 00:09 I love it how it starts the editor for you 2008-08-25 00:10 yeah i did that once 2008-08-25 00:10 yes, smooth and obvious 2008-08-25 00:11 i was pleasantly suprised, thought "this is too easy" 2008-08-25 00:11 sure smacks svn 2008-08-25 00:12 hg does need a way to delete a head 2008-08-25 00:12 though if I had that, I probably would not have tried merging like a good boy 2008-08-25 00:13 it's only just after midnight, I should add another attr maybe 2008-08-25 00:14 link count 2008-08-25 00:14 the last one that's needed for initial prototype I think 2008-08-25 00:15 say, are we going to allow more than 4 billion links to the same file? 2008-08-25 00:16 why is ctime clumped with mode,uid,gid? 2008-08-25 00:17 aren't they normally all set at the same time? 2008-08-25 00:17 ctime? 2008-08-25 00:17 create time 2008-08-25 00:17 change time 2008-08-25 00:17 no it is create, you're right about being set together 2008-08-25 00:17 when the uid/gid/mode are normally set 2008-08-25 00:18 but not likely to be shared 2008-08-25 00:18 true 2008-08-25 00:18 just feels wrong to put ctime in there 2008-08-25 00:18 we can have a separate attr that only encdes ctime, maybe 2008-08-25 00:18 could be wrong indeed 2008-08-25 00:19 it doesn't make a huge difference, about 2 extra bytes/inode 2008-08-25 00:19 to have it separate 2008-08-25 00:19 that's about 5% of the size of an inode 2008-08-25 00:19 basic inode 2008-08-25 00:21 so 16 attribute types supported? or 15? 2008-08-25 00:22 16 2008-08-25 00:22 one of them is "extended attribute" 2008-08-25 00:22 attribute structure and variations can get aribtrarily complicated, so the 16 is just to capture the most common ones 2008-08-25 00:23 right so how do you know how many are stored? 2008-08-25 00:23 the inode dictionary gives the size of the inode 2008-08-25 00:23 ileaf dict 2008-08-25 00:23 how many bits is that? 2008-08-25 00:24 64K 2008-08-25 00:24 but limit is the size of a table block 2008-08-25 00:24 we are going to have to let inodes overflow into the next block 2008-08-25 00:25 added another attribute 2008-08-25 00:25 took about 5 minutes this time 2008-08-25 00:25 sign of a good interface 2008-08-25 00:26 will think about separating out mtime 2008-08-25 00:26 or have an alternate, separate mtime 2008-08-25 00:26 that is probably the way to go 2008-08-25 00:26 you mean ctime? 2008-08-25 00:26 the presense of mtime with no owner means "inherit" 2008-08-25 00:26 yes ctime 2008-08-25 00:27 could get tricky with multiple versions 2008-08-25 00:27 also might have a separate mtime, for when database-type writes modify the file without changing the size 2008-08-25 00:27 yeah i was thinking that when i read iattr.c 2008-08-25 00:28 the immediate goal is just to get the prototype up 2008-08-25 00:28 optimize for ownership inheritance later 2008-08-25 00:28 I think all the necessary attributes are there now 2008-08-25 00:29 possilby want a blocks count, but can just ignore that to start 2008-08-25 00:29 its funny though, when you trim the size down this much you see how much fat is really there 2008-08-25 00:29 compared to? 2008-08-25 00:29 with the non-inheritent compact metadata storage the vast majority will be redundant mode,owner 2008-08-25 00:30 yes 2008-08-25 00:30 it would be nice to get down to 16 byte minimum inode size 2008-08-25 00:30 24 bytes a "hello" immediate file 2008-08-25 00:31 why 24? 2008-08-25 00:32 just thinkin what the mininum would be 2008-08-25 00:32 need 5 bytes for hello 2008-08-25 00:32 2 bytes to say this is in immediate data attribute + version 2008-08-25 00:32 so 16 + 7 ~= 24 2008-08-25 00:33 ;-) 2008-08-25 00:33 yeah newline at the end ;) 2008-08-25 00:33 so the whole point of tux3 is to compress the zumastor configuration database? i knew it!! 2008-08-25 00:34 right 2008-08-25 00:35 versus how big on zfs? 2008-08-25 00:35 I shudder to think 2008-08-25 00:35 512 bytes dnode I think 2008-08-25 00:36 but I don't know what a dnode is 2008-08-25 00:36 I imagine it's hugely grosser than that 2008-08-25 00:36 they use 128 bytes minimum for a pointer 2008-08-25 00:36 I don't know if they have immediate data 2008-08-25 00:36 oh they do 2008-08-25 00:36 bits you mean 2008-08-25 00:36 in the 128 byte pointer maybe 2008-08-25 00:36 bytes I mean 2008-08-25 00:36 ACTION blinks 2008-08-25 00:36 not kidding 2008-08-25 00:37 zfs is pretty gross actually 2008-08-25 00:37 it looks best in a brochure 2008-08-25 00:38 sounds like ipv6, but much worse 2008-08-25 00:40 I wonder which one has more deployments 2008-08-25 00:40 ipv6 by a lot 2008-08-25 00:41 its been around.... 15 years? 2008-08-25 00:41 I've never run into one in the wild 2008-08-25 00:41 as a percentage of who _could_ use it, it may be around a tie right now 2008-08-25 00:42 ok, iattr.c should be nearly done for now 2008-08-25 00:42 have to hook it up to inode.c 2008-08-25 00:42 nah, big networks are all doing v6 for backbones 2008-08-25 00:43 any big win? 2008-08-25 00:43 makes routing easier 2008-08-25 00:43 supported in hardware on the big routers 2008-08-25 00:43 you actully get a discount if you talk v6 to them 2008-08-25 00:43 discount? 2008-08-25 00:44 oh 2008-08-25 00:44 onthe peering 2008-08-25 00:44 yeah 2008-08-25 00:44 thanks, but no thanks 2008-08-25 00:44 i'll keep my nat 2008-08-25 00:44 it is kind of sad that I now will replace 4 lines with 150 lines 2008-08-25 00:45 few computers need public ips 2008-08-25 00:45 the amount of code that was needed to do the endian conversions + encode/decode attrs 2008-08-25 00:45 why 2008-08-25 00:45 why? 2008-08-25 00:46 typical phillips bloatware code :P 2008-08-25 00:46 somebody should add endian attributes to gcc 2008-08-25 00:46 and reliable, predictable bit fields 2008-08-25 00:47 whats not reliable/predictable about bit fields in gcc? 2008-08-25 00:47 there's no guarantee on where they will end up in the data object 2008-08-25 00:48 so you can't use them to define disk formats 2008-08-25 00:48 code would be a lot less if you could 2008-08-25 00:50 shall we go with the "howmuch" function, or think of a more respectable name? 2008-08-25 00:54 howmuch is respectable 2008-08-25 00:54 it allready changed to howbig ;-) 2008-08-25 00:55 how about __iattr_how_many_bytes 2008-08-25 00:55 ooh pretty 2008-08-25 00:56 just preface every function with __ 2008-08-25 00:56 so you always remember which file you're looking at 2008-08-25 00:56 and that the authors _ key was working 2008-08-25 00:57 what about the type of the return value? 2008-08-25 00:57 aren't you supposed to encode that in the name? 2008-08-25 00:57 oh right 2008-08-25 00:59 i threw up a lot working on bind last week 2008-08-25 00:59 you know its going to be bad when you have to look in the "bin" dir for all the source code 2008-08-25 01:00 wow, that was easy to integrate 2008-08-25 01:00 the high level code got about half the size 2008-08-25 01:01 vs the straight struct banging 2008-08-25 01:02 grep -r seems to indicate 64844 occurances of "isc" in the bind9 source tree 2008-08-25 01:02 isc? 2008-08-25 01:02 oh, thats just lines containing isc 2008-08-25 01:02 internet systems consortium 2008-08-25 01:02 the company that makes bind 2008-08-25 01:03 everything is isc_ 2008-08-25 01:03 about about discdrive? 2008-08-25 01:03 :-) 2008-08-25 01:03 oh and thats just *lines containing isc* 2008-08-25 01:03 not occurances 2008-08-25 01:04 if you add up all the bytes "isc" takes up in the bind source its probably an order of magnitude larger than the djbdns source 2008-08-25 01:08 isc_uint32_t isc_random_jitter(isc_uint32_t max, isc_uint32_t jitter); 2008-08-25 01:08 heh 2008-08-25 01:08 oh, now we can have a generic attribute dumper 2008-08-25 01:08 instead of a lame hexdump 2008-08-25 01:08 indeed 2008-08-25 01:09 kudos for you for at least trying to make that stinking thing a little better 2008-08-25 01:10 waste of time 2008-08-25 01:10 what I meant 2008-08-25 01:10 the direct c file includes are going to break pretty soon 2008-08-25 01:11 and it will go to a "proper" makefile 2008-08-25 01:12 for the moment I think I will include iattr.c in ileaf.c 2008-08-25 01:12 the notmain thing will break then 2008-08-25 01:12 ugh 2008-08-25 01:12 why did you start the attrs at 6? 2008-08-25 01:12 maybe it all breaks right now 2008-08-25 01:12 just so I could see them 2008-08-25 01:12 time to change the base to zero, almost 2008-08-25 01:13 though zeros are rather common 2008-08-25 01:13 most of the mistakes were caught by the assert on unknown kind, which would not have been unknown if zero was an attribute 2008-08-25 01:15 good call 2008-08-25 01:19 oh wow, the c file includes/notmain hack didn't break when I included iattr.c in ileaf.c 2008-08-25 01:20 this monster gets to stumble on another cycle 2008-08-25 01:24 http://lwn.net/Articles/112567/ 2008-08-25 01:25 should support xattr out of the gate 2008-08-25 01:31 ah, inode table dump looks much better with proper attribute dump 2008-08-25 01:31 yes, xattr is in from the start, except not the first kernel port 2008-08-25 01:31 got to cut some corners somewhere 2008-08-25 01:32 ah, now I notice that a bogus empty inode table leaf has crept in 2008-08-25 01:33 now that the table dump isn't all noisy 2008-08-25 01:33 1 level btree 0xbf900988 at 64: 2008-08-25 01:33 0x0/0, 4084 free: 2008-08-25 01:33 0x47/1, 4054 free: 2008-08-25 01:33 0x47: ctime 0 mode 81c0 uid 0 gid 0 btree (block 48 depth 1) (30 bytes) 2008-08-25 01:33 0x64/1, 4054 free: 2008-08-25 01:33 0x64: ctime 0 mode 41c0 uid 0 gid 0 btree (block 45 depth 1) (30 bytes) 2008-08-25 01:34 what sort of work is involved in porting to the kernel? 2008-08-25 01:34 have to un c99 it 2008-08-25 01:34 lindent 2008-08-25 01:34 locks 2008-08-25 01:34 hook up to bio interface 2008-08-25 01:34 hook up to vfs interfaces 2008-08-25 01:34 another dozen things I forgot 2008-08-25 01:35 hm, fun 2008-08-25 01:41 whoops, broke make with the #define main tricks 2008-08-25 01:41 I knew that was too easy 2008-08-25 01:48 The ctime--change time--is the time when changes were made to the file or directory's inode (owner, permissions, etc.). The ctime is also updated when the contents of a file change. It is needed by the dump command to determine if the file needs to be backed up. You can view the ctime with the ls -lc command 2008-08-25 01:49 so ctime always gets incremented when mtime does 2008-08-25 01:49 bleah 2008-08-25 01:49 the should probably be bundled 2008-08-25 01:49 what use is that? 2008-08-25 01:49 yes 2008-08-25 01:50 pretty crap feature 2008-08-25 01:50 beyond crap 2008-08-25 01:50 what's your source? 2008-08-25 01:50 so its essentially a copy of the mtime, but also gets updated if you chmod/chown 2008-08-25 01:50 flips: the internets 2008-08-25 01:50 and experimenting 2008-08-25 01:50 thanks for the heads up 2008-08-25 01:50 so yes, refactor that steaming pile 2008-08-25 01:51 are the drugs we have these days as good as the ones those guys were on? 2008-08-25 01:51 heh 2008-08-25 01:52 reminds me of a line my car geek friends have 2008-08-25 01:52 80s engine management is the result of 60s drug use 2008-08-25 01:52 what we are going to do is let mtime shadow ctime 2008-08-25 01:52 yeah and only add ctime if needed 2008-08-25 01:52 yes 2008-08-25 01:52 more cruft 2008-08-25 01:53 to debug 2008-08-25 01:53 xfs inodes are 256bytes by default 2008-08-25 01:54 caused some problems with selinux xattrs 2008-08-25 01:54 not enough room for them to fit in 2008-08-25 01:55 ext3 fits them in 2008-08-25 01:55 hey, want to collaborate on an article about filesystems for linux world? 2008-08-25 01:55 I've been asked to write about versioning filesystems 2008-08-25 01:56 zfs, btrfs, tux3 2008-08-25 01:56 sure 2008-08-25 01:56 I'll send email around 2008-08-25 01:56 it will be fun 2008-08-25 01:56 something like proctology 2008-08-25 01:56 get to learn all about them 2008-08-25 01:56 i should try btrfs 2008-08-25 01:56 yes 2008-08-25 02:00 ah the ctime and mtime are actually often different 2008-08-25 02:01 hmm, char * has a special feature that void * does not have 2008-08-25 02:02 you can subtract it from a pointer to anything else, and the other pointer is quietly converted to char * 2008-08-25 02:02 void * causes an error on subtract 2008-08-25 02:02 that is probably a flaw in definition of void * 2008-08-25 02:04 because you can set the mtime using utime() 2008-08-25 02:04 like tar does, when you untar something 2008-08-25 02:05 but you cannot alter the ctime 2008-08-25 02:05 however for the majority of the files we write() to, they will be equal 2008-08-25 02:06 I see 2008-08-25 02:06 ctime can stay in the owner group 2008-08-25 02:06 well 2008-08-25 02:06 no way 2008-08-25 02:06 let me see 2008-08-25 02:07 I was thinking if mtime can shadow it 2008-08-25 02:07 but now it seems it can't 2008-08-25 02:07 mtime could shadow ctime, yes 2008-08-25 02:07 only add mtime if someone sets it with utime() 2008-08-25 02:08 but if mtime is explicitly set then it can't shadow ctime 2008-08-25 02:08 it can be set to some time in the past, right? 2008-08-25 02:08 yeah 2008-08-25 02:08 why not have it shadow ctime 2008-08-25 02:08 right 2008-08-25 02:08 unless mtime is present 2008-08-25 02:08 it should 2008-08-25 02:09 and that attribute group should be ctime/size instead of mtime/size 2008-08-25 02:09 mtime gets its own attribute 2008-08-25 02:09 yep 2008-08-25 02:28 done 2008-08-25 02:30 and done for the evening 2008-08-25 02:32 its early 2008-08-25 02:34 mtime also needs to be added on a chmod/chown 2008-08-25 02:34 as a copy of the old ctime (if it doesn't already exist) 2008-08-25 02:35 mtime is also always deleted when ctime is updated due to a write 2008-08-25 03:15 -!- pgquiles(~pgquiles@239.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-08-25 05:17 -!- pgquiles(~pgquiles@6.Red-81-39-193.dynamicIP.rima-tde.net) has joined #tux3 2008-08-25 08:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-25 10:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-25 10:49 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-25 14:01 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-25 14:25 crappy python 2008-08-25 14:25 quietly overflows its numerics in spite of running at 1/200th the speed of C 2008-08-25 14:28 oh wait 2008-08-25 14:29 was me 2008-08-25 14:29 python uses a**b for power, not a^b 2008-08-25 14:49 then how do you do bitwise xor? 2008-08-25 14:51 oh i thought you said that the other way 2008-08-25 14:51 why would you think it was ^? 2008-08-25 14:52 hard to see why python needs xor 2008-08-25 14:52 well 2008-08-25 14:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-25 14:52 I guess it does 2008-08-25 14:52 haven't finished my first cup of coffee is why 2008-08-25 14:52 navel gazing post on time resolution is up 2008-08-25 14:53 now back to the question of creating an actual exabyte file 2008-08-25 14:53 and that will be 1^60 bytes, not 1^60 less one 2008-08-25 14:55 to do that I think I will store the highest addressable byte in the csize attribute, not the actual size, which is one greater 2008-08-25 14:55 which means zero is not allowed as a csize value 2008-08-25 14:56 we will remove the size attribute instead 2008-08-25 14:56 or I might just spend the extra bit ;-) 2008-08-25 15:26 flips: you've got mail 2008-08-25 15:26 so I do 2008-08-25 15:27 interesting 2008-08-25 15:39 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-08-25 15:43 15:40 <== One of the features of btrfs is fs-based softraid with duplicated metadata. So if it detects a bad checksum of hte metadata on one disk, it can get it from another location. It can even do that with 1 disk 2008-08-25 15:44 misfeature imho 2008-08-25 15:44 raid doesn't belong in the fs 2008-08-25 15:44 they will be debugging that for ages 2008-08-25 15:45 tux3 may take a more practical approach 2008-08-25 15:45 if it detects an error in metadata, rebuilt the index 2008-08-25 15:46 the index is just an accelerator after all 2008-08-25 15:52 how will it detect the error? 2008-08-25 15:52 what index is just an accelerator? 2008-08-25 15:52 btree index 2008-08-25 15:53 only the leaves matter 2008-08-25 15:53 they each will be identified with the data they reference 2008-08-25 15:53 since btrfs gives less protection to data, that puts it on equal footing 2008-08-25 15:53 ok what if you lose a sector some of your metadata is on 2008-08-25 15:53 both can lose data if redundancy of underlying storage is exceeded 2008-08-25 15:54 which metadata? 2008-08-25 15:54 say a dleaf 2008-08-25 15:54 anything really 2008-08-25 15:54 then you blow a hole in some data, what is new? 2008-08-25 15:54 each dleaf stands on its own 2008-08-25 15:54 with btrfs you just read from a dedundant copy 2008-08-25 15:54 doesn't help if a chunk of data was lost 2008-08-25 15:54 what if its an inode table 2008-08-25 15:54 same thing 2008-08-25 15:55 lose a range of inodes 2008-08-25 15:55 it isn't going to happen 2008-08-25 15:55 if it does, so some data is gone 2008-08-25 15:55 but btrfs has a feature which makes that not happen, right? 2008-08-25 15:55 try pulling two disks from a btrfs array and see what happens 2008-08-25 15:55 (oops probably) 2008-08-25 15:55 i am only talking 1 disk 2008-08-25 15:55 no 2008-08-25 15:55 btrfs claims to have such a feature 2008-08-25 15:55 it's wanking 2008-08-25 15:56 job of the raid system 2008-08-25 15:56 ok i'm not referring to the implementation 2008-08-25 15:56 i'm referring to the design 2008-08-25 15:56 stupid design 2008-08-25 15:56 very costly in terms of complexity 2008-08-25 15:56 ok 2008-08-25 15:56 wrong level 2008-08-25 15:56 too complex 2008-08-25 15:56 yes 2008-08-25 15:56 complexity = unreliability 2008-08-25 15:56 ACTION blushes about dleaf 2008-08-25 15:56 heh 2008-08-25 15:57 at least that complexity is confined 2008-08-25 15:57 raid complexity tends to get splatted throughout an entire system 2008-08-25 15:57 if you don't take measures to confine it 2008-08-25 15:57 most peopledont really care about losing data on a single disk system 2008-08-25 15:57 its proably more likely you lose the whole disk than get a bad block anyway 2008-08-25 15:58 true, and the only time it ever happened to me was when the disk stopped and died 2008-08-25 15:58 or upgraded an ubuntu system with root on lvm 2008-08-25 15:58 only two times 2008-08-25 15:58 in 15 years of abusing my disks and filesystems 2008-08-25 15:58 the other claimed feature if being able to read the metadata off the least busy device 2008-08-25 15:59 hah 2008-08-25 15:59 I'll believe that when I see it 2008-08-25 15:59 just have lots of spindles 2008-08-25 15:59 and let it go 2008-08-25 15:59 detecting that seems...hard 2008-08-25 15:59 yes 2008-08-25 15:59 quixotic 2008-08-25 15:59 there are so many things that actually matter 2008-08-25 15:59 like having a light footprint on cache 2008-08-25 16:00 never mind tlb 2008-08-25 16:01 got that guitar player's cd 2008-08-25 16:01 ten bucks and I am a happy puppy 2008-08-25 16:25 serial typos :-/ 2008-08-25 17:12 flips: you've got mail 2008-08-25 17:13 my penis is already long enough, thanks <- somebody made me type that 2008-08-25 17:14 was that somebody shapor? 2008-08-25 17:14 man in the middle attack I think 2008-08-25 17:14 lol 2008-08-25 17:14 nice 2008-08-25 17:15 http://www.shiningsilence.com/dbsdlog/2008/08/25/3041.html 2008-08-25 17:15 reading 2008-08-25 17:18 tim_dimm, you have male 2008-08-25 17:18 it is just about sk8 oclock 2008-08-25 17:49 time for a wheel change 2008-08-25 17:49 these ones ground down to the nub from slaloming down seaview terrace ;-) 2008-08-25 17:50 doubt I'll get out today. maybe. 2008-08-25 17:50 have a 7pm with my spousal unit 2008-08-25 17:50 interviewing the night nurse 2008-08-25 17:51 night nurse? 2008-08-25 17:51 "interviewing the night nurse" sounds like a bad porn 2008-08-25 17:51 shapor and I will drink a toast to you 2008-08-25 17:51 it's on shapor 2008-08-25 17:51 helps out a few nights during first few weeks 2008-08-25 17:51 hey! 2008-08-25 17:51 I should be there for that 2008-08-25 17:51 ! 2008-08-25 17:52 sure should 2008-08-25 17:52 topic swap; these ceramic mini bearings will be fast at maryhill 2008-08-25 17:53 doing a good job scatching the paint off my frames sliding down that rail 2008-08-25 17:53 too bad I only have 16 2008-08-25 17:53 could not convince shapor to try 2008-08-25 17:53 maybe today 2008-08-25 17:53 then there's the barrel the skateboarders jump over 2008-08-25 17:53 should be good for a pretty spectacular crash 2008-08-25 17:54 on thurday, it will be 120 skateboarders vs. 8 inliners 2008-08-25 17:54 we'll be a bunch of dorks 2008-08-25 17:54 they get out of the way more obligingly lately 2008-08-25 17:54 but we might be faster than them 2008-08-25 17:54 I take care to skate close to them when they veer towards me 2008-08-25 17:55 they seem to like it 2008-08-25 17:55 clapping when they land one seems to help too 2008-08-25 17:55 don't have to do that much 2008-08-25 18:00 I wore the front and back 80's down to 74 2008-08-25 18:00 got nice rocker 2008-08-25 18:00 lousy tracking 2008-08-25 18:00 and the middle? 2008-08-25 18:01 this is very autistic of you, btw 2008-08-25 18:02 ;-) 2008-08-25 18:02 76 2008-08-25 18:02 discussing skating on the irc- that's not right 2008-08-25 18:02 because? 2008-08-25 18:02 cause irc is geeky enough 2008-08-25 18:03 :-D 2008-08-25 18:03 skating is an essential part of the design process 2008-08-25 18:03 alan skates a double rocker 2008-08-25 18:03 80's in the middle, 76's on the ends, with the ends raised up 2mm 2008-08-25 18:03 gives him 4mm of rocker 2008-08-25 18:04 explaining why he doesn't go faster than 20mph 2008-08-25 18:04 I need at least 3 mm, only have 2 2008-08-25 18:04 you only have 1mm 2008-08-25 18:04 you measured diameter, not radius 2008-08-25 18:04 right 2008-08-25 18:05 even that makes a big difference 2008-08-25 18:05 I have 2mm in my missions 2008-08-25 18:05 when I set the elastomers to soft 2008-08-25 18:05 2 would do me then 2008-08-25 18:05 yeah you skaters are a bunch of dorks 2008-08-25 18:05 can bias the preload left/right too 2008-08-25 18:05 make big diff on the feel 2008-08-25 18:06 never knew there was so much detail to skating 2008-08-25 18:06 these k2's have no adjustment whasoever 2008-08-25 18:06 I'll be going for a pair of seba fr1's pretty soon 2008-08-25 18:06 konrad: http://homepage.mac.com/timothyhuber/downhill/iMovieTheater68.html 2008-08-25 18:06 figure out how to get them from europe 2008-08-25 18:06 figured 2008-08-25 18:07 urk, virgin bearings 2008-08-25 18:07 got to push hard 2008-08-25 18:07 ACTION didn't write that either 2008-08-25 18:07 have to take em apart, get the nasty factory grease out and *cough* relube 2008-08-25 18:07 shapor pwning my keyboard probably 2008-08-25 18:07 tim_dimm: oh wow 2008-08-25 18:08 we hit 51.2 on sunday 2008-08-25 18:08 slight tailwind 2008-08-25 18:08 temps were cool 2008-08-25 18:08 jeez 2008-08-25 18:08 traction fell off by 10am 2008-08-25 18:09 just as good, the car clubs came out with their ferraris, porsches, lotus, subbies, etc 2008-08-25 18:09 you should see shapor doing downhill 2008-08-25 18:09 fucker just learned to skate 5 months ago- already hitting 35 2008-08-25 18:09 I'm a skier, front-back balance is a lot easier on us 2008-08-25 18:10 oh yeah 2008-08-25 18:10 although with the 5 wheel skates, that's not a problem 2008-08-25 18:10 slowing down is the problem 2008-08-25 18:10 how does one do that? 2008-08-25 18:10 http://www.dailymotion.com/tag/descente/video/764 2008-08-25 18:10 thought you'd ask 2008-08-25 18:11 that's how the best in the world do it 2008-08-25 18:11 I throw slalom turns, can scrub 15mph in ~25 ft 2008-08-25 18:12 next week is a world cup event in maryhill 2008-08-25 18:12 black & white wheels on the same skate look kinda cool 2008-08-25 18:13 http://www.maryhillfestivalofspeed.com/ 2008-08-25 18:14 wow. 2008-08-25 18:14 most of us are geeks over 40 2008-08-25 18:15 scott peer works at jpl. cassini runs on his nav software 2008-08-25 18:15 warren focke is an astrophysicist at stanford 2008-08-25 18:16 Washington state, not DC? 2008-08-25 18:16 y 2008-08-25 18:17 awesome road 2008-08-25 18:17 http://www.panoramio.com/photos/original/7534977.jpg 2008-08-25 18:17 40 is when you realize your knees can't survive jogging for another 20 years 2008-08-25 18:17 i learned that at 28 when I started skating 2008-08-25 18:18 tim_dimm: looks like eastern washington 2008-08-25 18:18 y 2008-08-25 18:19 90 miles east of portland 2008-08-25 18:21 we should do a tux3 ski trip 2008-08-25 18:23 we need to 2008-08-25 18:24 mount washington 2008-08-25 18:24 got the best snow in the pacific northwest last year 2008-08-25 18:24 and I have an in with the restaurant owner 2008-08-25 18:25 my right wheels show much more asymmetric wear then left 2008-08-25 18:25 got to fix that 2008-08-25 18:25 3 yrs ago, had thigh deep powder at mt baldy, 45 min from la 2008-08-25 18:25 heh 2008-08-25 18:26 many el nina will do it for us again this year 2008-08-25 18:26 maby 2008-08-25 18:26 i wish 2008-08-25 18:26 ACTION was born a powder pig 2008-08-25 18:26 ow 2008-08-25 18:27 konrad: where r u? 2008-08-25 18:27 seattle, washington area 2008-08-25 18:28 so you get what, 20-30 days /yr? 2008-08-25 18:28 pff 2008-08-25 18:28 I'm more lazy than that 2008-08-25 18:28 I think last season I only skied about 10 days 2008-08-25 18:29 i got 100 in '93 2008-08-25 18:29 ACTION skated 10 days in the last 10 days 2008-08-25 18:29 that's the difference between skiing and skating 2008-08-25 18:29 yeah. 2008-08-25 18:29 I'd like to pick it up 2008-08-25 18:29 they feel kind of the same except you can skate whenever you feel like it 2008-08-25 18:29 the only skating I've done was as a kid in a super smooth rink 2008-08-25 18:30 it's the same outside but bigger and bumpier 2008-08-25 18:30 mhm 2008-08-25 18:30 my wife used to be a professor at U W 2008-08-25 18:30 and steeper sometimes 2008-08-25 18:30 fun 2008-08-25 18:31 more cars too 2008-08-25 18:31 more bikinis too 2008-08-25 18:31 :) 2008-08-25 18:32 tim taught me to skate backwards for that very reason 2008-08-25 18:32 heh 2008-08-25 18:32 nice' 2008-08-25 18:32 skiing backwards isn't so hard 2008-08-25 18:33 I'm big on skiing sideways 2008-08-25 18:33 I like to roll too 2008-08-25 18:33 no arials, that is sick 2008-08-25 18:33 skiing sideways? 2008-08-25 18:33 yup 2008-08-25 18:33 fast 2008-08-25 18:33 for fun 2008-08-25 18:34 easy to catch an edge that way 2008-08-25 18:34 yup 2008-08-25 18:34 that's when the foll skillz help 2008-08-25 18:34 come up skiing, don't lose the rhythm 2008-08-25 18:34 heh 2008-08-25 18:34 I don't have those skills :( 2008-08-25 18:34 easy to get 2008-08-25 18:34 I just do my best not to fall 2008-08-25 18:34 just don't care ;-) 2008-08-25 18:34 ah, falling is part of a run for me 2008-08-25 18:35 got boring after not falling for a couple years ;) 2008-08-25 18:35 flips, you should try flips ! 2008-08-25 18:35 no way 2008-08-25 18:35 I value my neck 2008-08-25 18:35 I did a little 2008-08-25 18:35 couple feeble attempts 2008-08-25 18:36 ok inline dh then 2008-08-25 18:36 was into springboard diving and trampolline 2008-08-25 18:36 my brother much more so 2008-08-25 18:36 I like to do multiple flips in freefall 2008-08-25 18:36 air is soft 2008-08-25 18:36 hard packed snow is insanity 2008-08-25 18:38 back flips in freefall is fun, each one goes faster 2008-08-25 18:38 because of the way your head and heels catch air when you're tucked 2008-08-25 18:38 front fliops require real exertion 2008-08-25 18:40 we need a poll feature on the irc 2008-08-25 18:40 like they have on forums 2008-08-25 18:40 shapor? 2008-08-25 18:40 we could have a poll to see how many want flips to demonstrate 2008-08-25 18:40 lol 2008-08-25 18:42 heh 2008-08-25 18:42 anna gave my rig away years ago 2008-08-25 18:42 then bought live insurance on me ;-) 2008-08-25 18:44 had one of those http://wraggj.people.cofc.edu/skydive_hist.html 2008-08-25 18:44 more than one 2008-08-25 18:44 three at my worst ;) 2008-08-25 18:46 rolling 2008-08-25 18:48 -!- boom(~boom@c-76-117-208-224.hsd1.nj.comcast.net) has joined #tux3 2008-08-25 20:21 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-08-25 20:41 50 turns, top to bottom of seaside terrace 2008-08-25 20:48 is that not many? 2008-08-25 20:58 that is many 2008-08-25 20:59 got lucky and there were no cars 2008-08-25 20:59 ah 2008-08-25 20:59 I havn't skated in too long 2008-08-25 20:59 my old skates are too small :S 2008-08-25 21:00 skates are cheap 2008-08-25 21:00 really nice ones for $200 2008-08-25 21:00 pff 2008-08-25 21:13 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-25 21:14 wow, they are asking $449 au in australia for the skates I paid $200 for and almost got for $150 2008-08-25 21:14 http://www.baysideblades.com.au/inline_skates_dt/inline_skates/k2/k2_frontman.htm 2008-08-25 21:16 hey 2008-08-25 21:16 ACTION is at Hot Chips 2008-08-25 21:16 sounds tasty 2008-08-25 21:17 ah, sushi time for me 2008-08-25 21:17 nice, I need exercise 2008-08-25 21:17 I'm rawling out of my skin right now, maybe do some push ups or something like that later 2008-08-25 21:18 get skates 2008-08-25 21:18 worked out the details of versioning the inode attributes on that skate 2008-08-25 21:21 hm 2008-08-25 21:21 mmm raw fish 2008-08-25 21:28 flips: how will that work 2008-08-25 21:28 versioned attributes? 2008-08-25 21:43 yeah 2008-08-25 21:43 writing a note about it for the list? 2008-08-25 21:44 eventually 2008-08-25 21:44 it's pretty straightforward 2008-08-25 21:44 works just like versioned pointers 2008-08-25 21:44 same algorithms 2008-08-25 21:44 only thing is, when we walk through a collection of attributes instead of computing just one "max ord" value, we compute an array 2008-08-25 21:45 one element for each attribute group 2008-08-25 21:45 then for each attribute group, the operative item is the one with highest ord 2008-08-25 21:46 a single pass through the unordered attribute list does all attributes 2008-08-25 21:46 see the latest checkin for something resembling that (dump_attrs is rewritten) 2008-08-25 21:51 shapor, would it make sense to put link_count together with mtime? 2008-08-25 21:55 why? 2008-08-25 21:55 that rarely changes 2008-08-25 21:55 and they really never change together 2008-08-25 21:55 the question is, does mtime change when link count changes 2008-08-25 21:56 maybe not 2008-08-25 21:56 no why would it? 2008-08-25 21:56 it's a modification? 2008-08-25 21:56 ok 2008-08-25 21:56 forget that 2008-08-25 21:56 no mtime is modificatino of the file data 2008-08-25 21:56 next consideration is whether link count should be part of the data attribute 2008-08-25 21:56 bearing in mind that there can be multiple data attributes with different versions 2008-08-25 21:57 changing the link count only changes the ctime 2008-08-25 21:57 since its considered an inode change 2008-08-25 21:57 s/considered / 2008-08-25 21:57 / 2008-08-25 21:57 and size changes are way more frequent than link count changes 2008-08-25 21:58 so does not make sense to bundle with ctime/isize 2008-08-25 22:00 hmm, I bet I broke the make 2008-08-25 22:00 yup 2008-08-25 22:12 1 level btree at 64: 2008-08-25 22:12 0 inode(s) starting at 0x0 (4084 free) 2008-08-25 22:12 1 inode(s) starting at 0x47 (4060 free) 2008-08-25 22:12 0x47: mode 81c0 uid 0 gid 0 btree 48/1 2008-08-25 22:12 1 inode(s) starting at 0x64 (4060 free) 2008-08-25 22:12 0x64: mode 41c0 uid 0 gid 0 btree 45/1 2008-08-25 22:12 that 0 inodes block is the original inode table leaf 2008-08-25 22:12 then I set an inode goal way higher 2008-08-25 22:12 so it didn't get used 2008-08-25 22:13 in practice it's always going to get used 2008-08-25 22:13 but still 2008-08-25 22:13 I wonder if I should let a btree be degenerate without a root until something tries to put an inode in it 2008-08-25 22:14 then make the initial leaf hold that first thing instead of assuming its based at inode zero 2008-08-25 22:14 probably not worth any effort 2008-08-26 00:15 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-26 03:09 -!- cdk(~chinmay@121.246.34.93) has joined #tux3 2008-08-26 03:15 -!- cdk(~chinmay@121.246.34.93) has left #tux3 2008-08-26 03:19 -!- cdk(~chinmay@121.246.34.93) has joined #tux3 2008-08-26 03:19 -!- cdk(~chinmay@121.246.34.93) has left #tux3 2008-08-26 03:43 folks 2008-08-26 04:03 -!- pgquiles(~pgquiles@6.Red-81-39-193.dynamicIP.rima-tde.net) has joined #tux3 2008-08-26 04:56 -!- flipz(~phillips@phunq.net) has joined #tux3 2008-08-26 07:23 -!- pgquiles(~pgquiles@189.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-08-26 07:39 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-26 09:46 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-26 11:13 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-26 11:41 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-26 12:58 hey 2008-08-26 12:58 ACTION is trying to do final packing before heading to BM 2008-08-26 13:38 -!- cybergirl(~cybergirl@ANantes-257-1-135-233.w90-32.abo.wanadoo.fr) has joined #tux3 2008-08-26 13:54 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-26 14:14 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-08-26 16:27 1225 /* We now have enough fields to check if the inode was active or not. 2008-08-26 16:27 1226 * This is needed because nfsd might try to access dead inodes 2008-08-26 16:27 1227 * the test is that same one that e2fsck uses 2008-08-26 16:27 1228 * NeilBrown 1999oct15 2008-08-26 16:27 1229 */ 2008-08-26 16:27 -- http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/inode.c#L1205 2008-08-26 16:27 :p 2008-08-26 16:30 "active" ? 2008-08-26 16:31 ah ESTALE handling 2008-08-26 16:31 hrm wouldn't get invalidated somehow when nlink drops to zero? 2008-08-26 17:16 I haven't plumbed those greasy depths 2008-08-26 17:17 http://lxr.linux.no/linux+v2.6.26.3/fs/ext2/super.c#L330 2008-08-26 17:17 -- nfs hack 2008-08-26 17:17 we can do it somewhat more cleanly I think 2008-08-26 17:49 flipz: trying to distribute some of the kernel port work? ;) 2008-08-26 17:53 of course 2008-08-26 17:53 you going to skate today? 2008-08-26 17:53 I mean, so far it's been mainly lets sit back and watch 2008-08-26 17:53 except for you 2008-08-26 17:53 as usual 2008-08-26 17:54 and as usual, that gets old after a while 2008-08-26 17:54 I'm sure chris mason went through the same thing 2008-08-26 17:59 once it mounts people will be interested 2008-08-26 18:31 is this like "will it blend?" 2008-08-26 18:32 ah that is a great idea for publicity 2008-08-26 18:32 get a hard drive and a blentec blender 2008-08-26 18:32 a few hard drives 2008-08-26 18:32 "zfs: will it blend?" 2008-08-26 18:33 tux3 could be the only filesystem that destroys the blendtec 2008-08-26 18:33 :P 2008-08-26 18:33 or just do it once and let tux3 blend ;) 2008-08-26 18:34 right on 2008-08-26 18:34 we could rig the blendtec with plastic blades 2008-08-26 18:34 and make smoke come from the motor 2008-08-26 18:34 when we try Tux3 2008-08-26 20:23 -!- flipz(~phillips@phunq.net) has joined #tux3 2008-08-27 00:42 shapor, once it mounts then people will pile on 2008-08-27 00:42 people are already interested 2008-08-27 00:42 but interested lazy people who do not want to help get it to the point of mounting are uninteresting 2008-08-27 00:49 indeed 2008-08-27 00:49 flipz, just read your post a couple times, looking at the ext2 along side 2008-08-27 00:50 annoying on this 1024x768 laptop display 2008-08-27 00:50 a phone? 2008-08-27 00:50 inode.c is still pretty braindamaged 2008-08-27 00:51 does not yet implement the code in the post 2008-08-27 00:51 last checkin was 26 hours ago :( 2008-08-27 00:51 oops 2008-08-27 00:51 forgot to pull 2008-08-27 00:53 also your post didnt get recogized as a response to the previous one 2008-08-27 00:53 it wasn't 2008-08-27 00:53 should have been 2008-08-27 00:54 ah 2008-08-27 00:54 saw that it started with "Re:" 2008-08-27 00:54 why would you do that? 2008-08-27 00:55 pulled 2008-08-27 00:55 distracted 2008-08-27 00:55 most probably I did reply 2008-08-27 00:55 and something messed up 2008-08-27 00:56 that's why good archivers have "probably in reply to" 2008-08-27 00:56 no the in-reply to header wasn't there 2008-08-27 00:56 you can make it all better by incorporing them into design.html ;-) 2008-08-27 00:57 yes 2008-08-27 00:57 got the subject line from somewhere 2008-08-27 00:57 did not type it in by hand 2008-08-27 00:57 therefore something screwed up 2008-08-27 00:57 it is quite hard actually 2008-08-27 00:57 much harder than coding 2008-08-27 00:57 cutting and pasting? 2008-08-27 00:58 in a cohesive fashion, yes 2008-08-27 00:58 or thinking clearly when faced with a bunch of fuzzy rambling design notes? 2008-08-27 00:58 :) 2008-08-27 00:58 thinking hurts my head too 2008-08-27 00:58 try to avoid it whenever possible 2008-08-27 00:59 getting that all together would take a whole day of sitting down with it all really 2008-08-27 00:59 dont have the time right now :( 2008-08-27 01:00 don't get it all together then 2008-08-27 01:00 just get one piece of it together 2008-08-27 01:00 yeah 2008-08-27 01:00 the koreans have a saying: starting is half 2008-08-27 01:00 heh 2008-08-27 01:00 since the koreans are the masters of lazy, it is important for them to have their motivators all lined up in a row 2008-08-27 01:11 there, another commit 2008-08-27 01:11 send html ;-) 2008-08-27 01:31 heh 2008-08-27 01:37 ah, I just realized why the inum allocation goal keeps changing 2008-08-27 01:37 because I decided to make it the same as the block allocation goal 2008-08-27 01:37 for now 2008-08-27 01:38 probably not going to stay that way 2008-08-27 01:38 but it is ok to start with 2008-08-27 01:38 planned that carefullly then forgot I did it ;-) 2008-08-27 01:38 needs a comment 2008-08-27 01:45 /* 2008-08-27 01:45 * For now the inum allocation goal is the same as the block allocation 2008-08-27 01:45 * goal. This gives us a maximum inum density of one per block and 2008-08-27 01:46 * should give pretty good spacial correlation between inode table blocks 2008-08-27 01:46 * and file data belonging to those inodes provided somebody sets the 2008-08-27 01:46 * block allocation goal based on the directory the file will be created. 2008-08-27 01:46 */ 2008-08-27 01:46 will be in I mean 2008-08-27 02:47 flipz: http://shapor.com/tux3/shapor-tux3/doc/design.html 2008-08-27 02:47 wee 2008-08-27 02:47 i've started dropping the pieces in the original post 2008-08-27 02:47 ooh, pretty 2008-08-27 02:48 growing in to a real doc 2008-08-27 02:48 should we check it into the repo yet? 2008-08-27 02:48 no, needs a lot more work, an hour ago it was just the 2008-July.txt mailing list archive from your mailman 2008-08-27 02:49 -!- jennyf(~jennyf@ANantes-257-1-135-233.w90-32.abo.wanadoo.fr) has joined #tux3 2008-08-27 02:49 although, it is the only copy 2008-08-27 02:49 hi jennyf 2008-08-27 02:49 there is this too: http://tux3.org/design.html 2008-08-27 02:49 yeah i looked at that 2008-08-27 02:50 mostly rubbish? 2008-08-27 02:50 is it anything more than html-ized lkml post? 2008-08-27 02:50 microscopically more 2008-08-27 02:50 didn't look like it was 2008-08-27 02:50 hm 2008-08-27 02:50 there are a few bits 2008-08-27 02:51 worth not killing 2008-08-27 02:51 i've only done minor editing, removing list like "i forgot to mention this in my original post:" 2008-08-27 02:51 the phtree part 2008-08-27 02:51 ok 2008-08-27 02:51 just keep posting to the list and i'll extract 2008-08-27 02:51 ;) 2008-08-27 02:51 "new user interfaces" 2008-08-27 02:52 other things like inode attributes have been completely superceded I think 2008-08-27 02:52 should not take too long to snarf the few bits that aren't treated better elsewhere 2008-08-27 02:53 I think it's just the two I mentioned 2008-08-27 02:53 ok 2008-08-27 02:54 actually, maybe i will make a commit with that in it 2008-08-27 02:54 simply because its my only copy 2008-08-27 02:54 want to pull it in? 2008-08-27 02:54 sure 2008-08-27 02:54 just say when 2008-08-27 02:55 one quick scan for things that jump out and rip out your eyeballs 2008-08-27 02:55 committed 2008-08-27 02:55 ACTION looks for the pull address 2008-08-27 02:57 hg view is a godsend for this process 2008-08-27 02:57 I wish it didn't use such an ugly widget set though 2008-08-27 03:01 still haven't seen it 2008-08-27 03:01 oh btw 2008-08-27 03:01 i had the first hg fail today 2008-08-27 03:01 my inbox got spammed with cron failures, trying to pull from you was failing 2008-08-27 03:01 for 12 hours or so 2008-08-27 03:02 when i ran hg pull on the command line, it complained that it needed a lock 2008-08-27 03:02 hmm 2008-08-27 03:02 there was some stuck hg pull process 2008-08-27 03:02 why was the pull failing? 2008-08-27 03:02 your side? 2008-08-27 03:02 stupidly i killed it 2008-08-27 03:02 or mine? 2008-08-27 03:02 yeah on my side 2008-08-27 03:02 i should have tried to figure out why it was hung before i killed it 2008-08-27 03:03 hopefully it will happen again 2008-08-27 03:03 yeah, we'll see 2008-08-27 03:04 well there it is 2008-08-27 03:05 well I wonder if I am going to get my 1 exabyte file in 8 k volume demo for tomorrow 2008-08-27 03:05 maybe not 2008-08-27 03:05 I'll relax a little on that 2008-08-27 03:06 other important stuff is getting done too 2008-08-27 03:06 yeah, as long as progress marches on 2008-08-27 03:07 I think I will add the logic to round down the inode table split boundaries to multiples of some binary number like 64 2008-08-27 03:07 at that point I might have to play the shapor card 2008-08-27 03:07 to expunge the bugs 2008-08-27 03:07 cuz its a little hairy already 2008-08-27 03:07 not a great deal of code, but logic is subtle 2008-08-27 03:08 whyd you change your nick 2008-08-27 03:08 z mean something special? 2008-08-27 03:08 huh? 2008-08-27 03:08 what nick? 2008-08-27 03:08 hah 2008-08-27 03:08 it means the other one was in use 2008-08-27 03:08 because I went oom and crashed 2008-08-27 03:08 linux is ugly that way 2008-08-27 03:08 eek 2008-08-27 03:09 without ulimits 2008-08-27 03:09 happens regularly 2008-08-27 03:09 sucks 2008-08-27 03:09 set ulimits ;) 2008-08-27 03:09 I don't, because I want to feel that pain 2008-08-27 03:09 and fix it one day 2008-08-27 03:09 speaking of feeling pain 2008-08-27 03:09 watched gladiator all the way through 2008-08-27 03:09 buy more ram isn't a solution really 2008-08-27 03:09 you need to see it 2008-08-27 03:09 no 2008-08-27 03:09 since ff will just eat it up 2008-08-27 03:10 crap on my system expands to fill all space 2008-08-27 03:10 crap being mainly firefox 2008-08-27 03:10 with all those porn tabs open 2008-08-27 03:10 yes 2008-08-27 03:10 pitiful 2008-08-27 03:10 russion porn tabs, the worst 2008-08-27 03:10 so, is windows better for something then... porn? 2008-08-27 03:10 windows is way worse 2008-08-27 03:10 oh wait, russian? than it is worse 2008-08-27 03:11 I think 2008-08-27 03:11 spyware 2008-08-27 03:11 based on my knowledge of its kernel structure 2008-08-27 03:11 hrm true, joelle does have to reboot a lot 2008-08-27 03:12 the last round of windows kernel development was mainly copying linux 2.6 features 2008-08-27 03:12 we suck at oom, so they suck worse 2008-08-27 03:12 funny really 2008-08-27 03:19 there we go 2008-08-27 03:19 nice meaty post 2008-08-27 03:19 on allocation strategy 2008-08-27 03:19 barely scratched the surface though 2008-08-27 03:20 that ones been cooking a while 2008-08-27 03:21 that post? 2008-08-27 03:21 wrote it starting a couple hours ago 2008-08-27 03:21 but yes 2008-08-27 03:21 been blathering about it a while 2008-08-27 03:53 valgrind errors in ileaf 2008-08-27 03:53 must have been there a while 2008-08-27 05:55 -!- pgquiles(~pgquiles@189.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-08-27 07:18 you guys had a late night 2008-08-27 07:18 talking about russian porn and all 2008-08-27 07:18 http://lxr.linux.no/linux+v2.6.26.3/CREDITS 2008-08-27 07:18 might be a source of tux3 developers 2008-08-27 07:19 flips: when you get a chance, troll that list and mark who's been naughty or nice. I'll fire off emails to the nice ones 2008-08-27 08:04 and another request- I could use a more condensed general description of Tux3 illustrating the main features/benefits. 2008-08-27 10:16 tim_dimm, I'll pen something 2008-08-27 10:16 question for u 2008-08-27 10:16 how would map reduce fit in along with tux3? 2008-08-27 10:16 or visa versa 2008-08-27 10:17 no idea 2008-08-27 10:17 lot of discussion about hadoop / map reduce on the cloud computing list 2008-08-27 10:17 Maybe shapor knows something about it 2008-08-27 11:45 tim_dimm: i don't think hadoop stuff really asks much of the filesystem 2008-08-27 14:46 So... exabyte file written 2008-08-27 14:46 with Tux3, an exabyte means an exabyte 2008-08-27 14:47 not an exabyte less one 2008-08-27 14:51 [12818] tuxseek: seek to 0xffffffffffffff4 2008-08-27 14:51 [12818] tuxread: read ffffffffffffff4/c 2008-08-27 14:51 0xbfd504a4: 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 "hello world!" 2008-08-27 14:53 [12843] tuxread: file pos 1000000000000000/c 2008-08-27 14:53 whoops 2008-08-27 14:54 flips: have you taken a decision about time resolution? 2008-08-27 14:54 [12861] tuxread: file pos 1000000000000000 2008-08-27 14:54 pgquiles, general idea is to go with 48 bits unless somebody jumps up with a use case that has to have more 2008-08-27 14:55 pgquiles, and 32 bits for atime, using the measures we discussed on the list to avoid breakage 2008-08-27 14:55 but not set in stone 2008-08-27 14:55 ext3's 1 second resolution is annoying for some uses 2008-08-27 14:55 that is too crude, yes 2008-08-27 14:56 millisecond resolution ought to be good enough for a file 2008-08-27 14:56 agreed 2008-08-27 14:56 0.1 seconds is good enough, IMHO 2008-08-27 14:56 probably 2008-08-27 14:56 1 second not quite 2008-08-27 14:57 100ms is NTFS' time resolution and my users are happy with that 2008-08-27 14:57 we should merge #zumastor and #tux3 in #daniel's :-P 2008-08-27 14:59 they're pretty separate 2008-08-27 14:59 for now 2008-08-27 14:59 at least until tux3 is ready to replicate 2008-08-27 15:29 ok, now tuxread and tuxwrite return EIO for access about 1 EB 2008-08-27 15:29 above 2008-08-27 15:43 mmm potstickers 2008-08-27 20:27 flips: minor (L) cleanups and Makefile committed to my repo 2008-08-27 20:32 I'll pull 2008-08-27 20:48 sefault boo 2008-08-27 20:48 inode->btree = new_btree(sb, &dtree_ops); // error??? 2008-08-27 20:48 needs to be handled 2008-08-27 20:49 segfault* 2008-08-27 20:54 ah this is the real culprit: 2008-08-27 20:54 486 struct buffer *rootbuf = new_node(&btree); 2008-08-27 20:55 hmm lots of missing error checking 2008-08-27 21:14 yup 2008-08-27 21:14 need to start using ERR_PTR 2008-08-27 21:15 which allows an errno to be overloaded on a pointer return 2008-08-27 21:15 used extensively in kernel 2008-08-27 21:16 hmm 2008-08-27 21:17 http://lxr.linux.no/linux+v2.6.26.3/include/linux/err.h#L22 2008-08-27 21:17 the getblk interface is classic evil 2008-08-27 21:18 returns NULL if anything goes wrong 2008-08-27 21:18 so everybody makes up some different random error to report higher 2008-08-27 21:20 whats the point of ERR_PTR? 2008-08-27 21:20 to return an error without adding a new parameter 2008-08-27 21:21 instead of checking for NULL return you check for IS_ERR 2008-08-27 21:21 just changes a type though 2008-08-27 21:21 looks quite pointless? 2008-08-27 21:21 it's how you use it 2008-08-27 21:22 if (IS_ERR(result = some_function()) return result; 2008-08-27 21:23 and some_function returns ERR_PTR(ENOMEM) etc 2008-08-27 21:24 I can never remember if it is supposed to be ERR_PTR(ENOMEM) or ERR_PTR(-ENOMEM) 2008-08-27 21:24 one of those will cause oopses 2008-08-27 21:24 ah i see 2008-08-27 21:24 hah 2008-08-27 21:24 crappy interface actually 2008-08-27 21:24 but 2008-08-27 21:24 everything in C is crappy 2008-08-27 21:25 so it fits 2008-08-27 21:25 exceptions yeah 2008-08-27 21:25 lack thereof 2008-08-27 21:25 errno! 2008-08-27 21:25 well we should adopt that interface 2008-08-27 21:25 if its what we need to kernel port anyway 2008-08-27 21:25 will make the port easier 2008-08-27 21:25 yes 2008-08-27 21:25 you'll see various shouts to myself about that 2008-08-27 21:32 ERR_PTR is in 2008-08-27 21:38 why is DATA_BTREE_ATTR called "root" ? 2008-08-27 21:47 hmm 2008-08-27 21:47 because it's the root of a btre 2008-08-27 21:47 btree 2008-08-27 21:47 gives the on-disk block that's the root 2008-08-27 21:48 wouldn't "data" make more sense though 2008-08-27 21:48 in the context of the other attributes 2008-08-27 21:49 it's a kind of data attribute 2008-08-27 21:49 one of four kinds 2008-08-27 21:49 it's btree data 2008-08-27 21:49 could call it btree, but that's already taken 2008-08-27 21:49 that is the in-memory version 2008-08-27 21:50 that has a bunch of extra fields and no endian requirment 2008-08-27 21:51 ACTION reads the inode attributes post 2008-08-27 21:51 which does? 2008-08-27 21:51 struct btree 2008-08-27 21:51 like struct inode, it's the cached version 2008-08-27 21:51 of a btree root 2008-08-27 21:54 ah 2008-08-27 22:11 zzz time 2008-08-28 02:46 -!- cdk(~chinmay@121.246.36.77) has joined #tux3 2008-08-28 02:48 -!- cdk(~chinmay@121.246.36.77) has left #tux3 2008-08-28 03:12 -!- pgquiles(~pgquiles@189.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-08-28 12:02 wow, time flies. It's already sk8 oclock 2008-08-28 12:14 hmm, does the root directory have a dirent? 2008-08-28 12:15 I don't think it does 2008-08-28 12:24 ACTION starts to draft the lkml post for next week 2008-08-28 16:19 simplifying assumption: inode attributes are always encoded in the same order 2008-08-28 16:19 I wonder if this makes things simpler 2008-08-28 16:19 hmm 2008-08-28 16:19 maybe random order is simpler 2008-08-28 16:19 always insert new attributes at the end of the list 2008-08-28 16:21 so updating attributes is a 3 step process: 0) figure out total size of attributes and old size then expand or shrink inode as necessary 1) remove attributes no longer used 2) insert new attributes at end of list 3) rewrite any attributes that changed 2008-08-28 16:21 that's 3 steps counting from 0 ;-) 2008-08-28 16:21 bh, there? 2008-08-28 16:34 let's try this algorithm: 2008-08-28 16:34 / for each attribute from bottom to top 2008-08-28 16:34 / if the attribute changed 2008-08-28 16:34 / encode new attribute 2008-08-28 16:34 / else unless the attribute is dropped 2008-08-28 16:34 / copy old attribute 2008-08-28 16:34 / for each new attribute 2008-08-28 16:34 / encode new attribute 2008-08-28 16:35 with a small optimization to avoid doing anything if the attribute neither has to be change or moved 2008-08-28 16:47 oh, slight mistake 2008-08-28 16:47 can't shrink the inode until after decoding the attributes 2008-08-28 16:48 iattr.c:218: warning: format '%i' expects type 'int', but argument 2 has type 'long int' 2008-08-28 16:48 should be %ti 2008-08-28 16:48 ah 2008-08-28 16:49 hm you want to copy all the attributes instead of change in place? 2008-08-28 16:49 not really 2008-08-28 16:49 but for a first cut maybe it's easiest 2008-08-28 16:49 lame 2008-08-28 16:50 you can optimize it 2008-08-28 16:50 heh ok, i think it will be less code to change in place 2008-08-28 16:50 I doubt it, but show me 2008-08-28 16:50 the lame version should be working sometime later today 2008-08-28 16:51 well then i wont get to optimize it until next week 2008-08-28 16:51 ACTION is not bringing a laptop on the motorcycle 2008-08-28 16:51 incentive for you to get back with your fingers intact 2008-08-28 16:51 I'll write it especially lamely to that end 2008-08-28 16:52 the %ti warning doesn't show up on 32 bit apparently 2008-08-28 16:53 yeah since the size matches i guess 2008-08-28 16:54 those unpack and repack ops are really efficient by the way 2008-08-28 16:54 they translate into just a couple of asm instructions most of the time 2008-08-28 16:54 once declared inline 2008-08-28 17:01 weekend reading material: http://students.cs.byu.edu/~cs460ta/cs460/labs/pthreads.html 2008-08-28 17:07 wow it's sk8 oclock again 2008-08-28 17:40 -!- olgagirl(~olgagirl@ANantes-257-1-135-233.w90-32.abo.wanadoo.fr) has joined #tux3 2008-08-28 19:12 what's with these racy nicks from france? 2008-08-28 19:34 -!- lafille(~lafille@ANantes-257-1-135-233.w90-32.abo.wanadoo.fr) has joined #tux3 2008-08-28 19:56 cool, everything also works with 256 byte blocks 2008-08-28 19:56 including writing an exabyte sparse file 2008-08-28 23:18 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-28 23:45 -!- konrad(~konrad@c-24-16-77-169.hsd1.mn.comcast.net) has joined #tux3 2008-08-29 07:10 -!- pgquiles(~pgquiles@189.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-08-29 10:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-08-29 11:47 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-08-29 12:45 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-08-29 12:45 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-08-29 20:51 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-08-30 05:49 -!- flips(~phillips@phunq.net) has joined #tux3 2008-08-30 12:08 -!- pgquiles(~pgquiles@195.Red-83-41-45.dynamicIP.rima-tde.net) has joined #tux3 2008-08-30 13:44 -!- pgquiles_(~pgquiles@161.Red-83-41-44.dynamicIP.rima-tde.net) has joined #tux3 2008-08-30 16:22 hmm, it's about sk8 oclock 2008-08-30 16:22 enough refactoring for the moment 2008-08-30 17:20 -!- pgquiles__(~pgquiles@161.Red-83-41-44.dynamicIP.rima-tde.net) has joined #tux3 2008-08-30 23:39 heh 2008-08-30 23:39 my old skates have 80mm wheels 2008-08-31 07:19 -!- pgquiles(~pgquiles@64.Red-81-44-62.dynamicIP.rima-tde.net) has joined #tux3 2008-08-31 10:36 -!- pgquiles(~pgquiles@153.Red-83-35-242.dynamicIP.rima-tde.net) has joined #tux3 2008-08-31 13:08 -!- pgquiles(~pgquiles@153.Red-83-35-242.dynamicIP.rima-tde.net) has joined #tux3 2008-08-31 17:37 tux3 is the 133rd google hit for "filesystem" 2008-08-31 17:37 this needs to be improved 2008-08-31 17:37 by posting working code of course 2008-08-31 17:37 but not only that 2008-09-01 02:37 -!- pgquiles(~pgquiles@153.Red-83-35-242.dynamicIP.rima-tde.net) has joined #tux3 2008-09-01 02:52 -!- pgquiles(~pgquiles@110.Red-83-41-45.dynamicIP.rima-tde.net) has joined #tux3 2008-09-01 03:01 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-01 11:50 flips: revision 199 broke inode.c, can pull fix from me 2008-09-01 11:50 how broke? 2008-09-01 11:51 I only see 198 revisions in my repo 2008-09-01 11:57 removed parameter from ext2_dump_entries 2008-09-01 11:57 ah yeah revision numbers aren't global 2008-09-01 11:58 already fixed 2008-09-01 11:58 here 2008-09-01 11:58 ok ;) 2008-09-01 11:58 the new interpreter makes it much easier to find bugs 2008-09-01 11:58 I found a bunch 2008-09-01 11:58 working on an inode table leaf split corruption one now 2008-09-01 11:59 the checking functions are really badly needed 2008-09-01 11:59 inode table block 0x0/40 (8c bytes free) 2008-09-01 11:59 0x0: [0] mode 0000000 uid 0 gid 0 root 22:1 ctime 0 size 2000 2008-09-01 11:59 0xd: [40] mode 0100700 uid 0 gid 0 root 24:1 2008-09-01 11:59 0x27: [64] mode 0100700 uid 0 gid 0 root 27:1 ctime 0 size ffffffffffffff 2008-09-01 11:59 resize inum 0xd at 0x28 from 24 to 40 2008-09-01 11:59 inode table block 0x0/40 (7c bytes free) 2008-09-01 11:59 0x0: [0] mode 0000000 uid 0 gid 0 root 22:1 ctime 0 size 2000 2008-09-01 11:59 you haven't checked it in yet? 2008-09-01 11:59 0xd: [40] mode 0100700 uid 0 gid 0 root 81c00000:24576 2008-09-01 11:59 0x27: [80] mode 0100700 uid 0 gid 0 root 27:1 ctime 0 size ffffffffffffff 2008-09-01 11:59 not yet 2008-09-01 12:00 right after this bug 2008-09-01 12:00 cool 2008-09-01 12:00 see the root attribute of inode d get messed up by the resize 2008-09-01 12:00 ah 2008-09-01 12:00 yeah 2008-09-01 12:01 for one thing, inode d isn't at offset 28, I don't know why it thinks it is 2008-09-01 12:01 anyway 2008-09-01 12:01 this one is my mess 2008-09-01 12:02 it turns out Tux3 can only do 64 petabytes with 256 byte blocks 2008-09-01 12:09 thats it?! 2008-09-01 12:09 it's because the dump was printing in decimal :p 2008-09-01 12:12 you mean pebibytes? 2008-09-01 12:12 http://en.wikipedia.org/wiki/Petabyte 2008-09-01 12:13 ACTION doesn't subscribe to that hairy footed nonsense 2008-09-01 12:13 why does anyone use base 10 anyway? 2008-09-01 12:14 when describing these things 2008-09-01 12:14 blame hard drive manufacturers i suppose 2008-09-01 12:16 why does anyone use base 10 for anything? 2008-09-01 12:16 something to do with counting on fingers and toes 2008-09-01 12:27 there was no bug 2008-09-01 12:28 intermediate state produced funny behavior 2008-09-01 12:40 on bug down 2008-09-01 12:40 parens around a conditional expression 2008-09-01 12:41 oh, that took care of two bugs 2008-09-01 12:41 nice 2008-09-01 12:41 ok time to check in 2008-09-01 12:41 now thats efficient ;) 2008-09-01 12:43 ./tux3 read --seek 72057594037927930 foodev foo <- this works 2008-09-01 12:43 reads the 64th petabyte of file foo in device foodev, with 256 byte blocks 2008-09-01 13:04 according to this ( http://blogs.netapp.com/standards_watch/2007/12/emc-netapp-dona.html ), there should be an NDMP implementation available from SNIA, which would make it easier to implement NDMP support in strigi, but yesterday I was unable to find that source code :-? 2008-09-01 13:04 oops, wrong channel 2008-09-01 13:04 hi all, btw :-) 2008-09-01 13:04 :) 2008-09-01 13:05 flips: going to add tux3 to the Makefile? 2008-09-01 13:05 not before I have a nap 2008-09-01 13:05 fee free 2008-09-01 13:05 feel free 2008-09-01 13:05 I'm writing a little post 2008-09-01 13:05 paste the cmdline in your shell history to build it ;) 2008-09-01 13:05 which should help make a test 2008-09-01 13:06 yes 2008-09-01 13:06 so i can add it without thinking as much 2008-09-01 13:07 g99 -g -Wall -lpopt buffer.c diskio.c tux3.c -otux3 2008-09-01 14:02 hey 2008-09-01 14:02 flips: just came back last night from Burning Man 2008-09-01 14:03 burned out? 2008-09-01 14:03 eh ? 2008-09-01 14:03 no, I had a blast 2008-09-01 14:03 joke 2008-09-01 14:04 oh ok, yeah, I figured half of Google's infrastructure engineering went out there as well 2008-09-01 14:05 I checked in several thousands lines of patches while you were taking care of more important things ;-) 2008-09-01 14:06 nice, I have a lot of work todo but I'm half clueless about certain parts of the scheduler code 2008-09-01 14:07 gregory is coming up with fixes for various scheduler path issues, but I doubt that it's going to fix the latency problem. The cross locks are very problematic 2008-09-01 14:08 I'll get gone for a bit 2008-09-01 14:08 ok 2008-09-01 17:10 back 2008-09-01 18:36 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-01 19:33 flips: what does checking the integrity of leaf nodes involve? 2008-09-01 19:35 also, gcc 4.3 won't build the tux3 tests 2008-09-01 20:20 ld complains that /usr/lib/libpopt.so is incompatible 2008-09-01 20:20 perhaps because tux3 is -std=gnu99? 2008-09-01 20:21 oh wait 2008-09-01 20:21 need to install 64-bit popt-devel 2008-09-01 20:23 konrad, hi 2008-09-01 20:24 hi 2008-09-01 20:24 konrad, it would be a great start just to check that the entries are all in non-descending order 2008-09-01 20:24 for dleaf 2008-09-01 20:25 for both dealf and ileaf, the upside down dictionaries should contain offsets in non-descending order 2008-09-01 20:25 to bottom of the dictionary should not be below the top of the highest entry 2008-09-01 20:25 can get fancier from there, but that will already detect most corruption 2008-09-01 20:25 ok 2008-09-01 20:26 :-) 2008-09-01 20:26 sounds like a hack about to begin 2008-09-01 20:26 shh 2008-09-01 20:26 if that's what it takes :-) 2008-09-01 20:27 before that 2008-09-01 20:27 another pretty straightforward project is to add more commands to tux3.c 2008-09-01 20:27 I get some weird errors building tux3.c 2008-09-01 20:27 like "remove" 2008-09-01 20:27 ah 2008-09-01 20:27 for some reason references to the inline functions go through ld 2008-09-01 20:27 you need to build with gcc -std=gnu99 2008-09-01 20:27 I am 2008-09-01 20:27 the errors are? 2008-09-01 20:28 ACTION checks to see if make works 2008-09-01 20:28 tux3/user/test/iattr.c:95: undefined reference to `encode16' 2008-09-01 20:28 x10 2008-09-01 20:28 you 2008-09-01 20:28 um 2008-09-01 20:28 lines 98, 99, 100, 103, 104, 107, 111, 114 2008-09-01 20:29 and 128, ... 2008-09-01 20:29 some others 2008-09-01 20:29 that is odd 2008-09-01 20:29 check your compile output 2008-09-01 20:29 there are some warnings about iattr.c: In function ‘decode16’: 2008-09-01 20:29 iattr.c:20: warning: ‘be_to_u16’ is static but used in inline function ‘decode16’ which is not static 2008-09-01 20:29 just a sec 2008-09-01 20:29 ah 2008-09-01 20:29 interesting 2008-09-01 20:29 ok 2008-09-01 20:29 what it to static inline 2008-09-01 20:30 change it to static inline 2008-09-01 20:30 I'll do that right now 2008-09-01 20:30 k 2008-09-01 20:30 should I do it to, or will you tell me when to pull? 2008-09-01 20:30 just about done 2008-09-01 20:31 builds now 2008-09-01 20:31 updated in repo 2008-09-01 20:31 good 2008-09-01 20:31 what's your gcc version? 2008-09-01 20:31 gcc --version 2008-09-01 20:31 4.3.0 2008-09-01 20:31 20080428 2008-09-01 20:31 I'm 4.1.2 2008-09-01 20:32 looks like a gcc regression 2008-09-01 20:32 smells like 2008-09-01 20:32 certainly possible 2008-09-01 20:32 but I' 2008-09-01 20:32 but I'm ok with this resolution 2008-09-01 20:32 I'm pretty happy with those endian macros 2008-09-01 20:32 very efficiently implemented 2008-09-01 20:32 mhm 2008-09-01 20:32 didn't even know they were there until last week 2008-09-01 20:33 found them by accident 2008-09-01 20:33 really essential 2008-09-01 20:33 #include <- the magic words 2008-09-01 20:34 yep 2008-09-01 20:34 I'm looking at that right now 2008-09-01 20:34 29 /* Return a value with all bytes in the 16 bit argument swapped. */ 2008-09-01 20:34 30 #define bswap_16(x) __bswap_16 (x) 2008-09-01 20:34 does it do-the-right-thing on big endian archs? 2008-09-01 20:34 you just did your first bug hunt ;-) 2008-09-01 20:34 yay 2008-09-01 20:35 I still get a couple format string warnings 2008-09-01 20:35 bswap is a 2 byte asm instruction as I recall 2008-09-01 20:35 runs at superscaler speed these days I think - can do more than one bswap per cycle 2008-09-01 20:35 what about on ppc, m68k or other BE archs? 2008-09-01 20:35 in other words, as close to free as it gets 2008-09-01 20:35 does it omit those or still try to swap? 2008-09-01 20:36 there are similar instructions on some of the other arches 2008-09-01 20:36 but since the native resolution for tux3 is bigendian, ppc is fine 2008-09-01 20:36 no swapping to do 2008-09-01 20:36 k 2008-09-01 20:36 big endian is _way_ nicer for debugging 2008-09-01 20:37 big endian is way nicer for everything :) 2008-09-01 20:37 bummer x86 isn't big endian 2008-09-01 20:37 had to stare at the hexdumps sometimes for a while, wondering why the lsb was up at the high end of the struct, I'm so used to the braindamaged intel order 2008-09-01 20:37 x86 is most braindamage, that happened to be implemented better than the other guys 2008-09-01 20:37 too bad about that 2008-09-01 20:38 motorola really blew it 2008-09-01 20:38 ibm's doing ok with ppc 2008-09-01 20:38 but I'd like to see them do more on power efficiency 2008-09-01 20:38 ppc rules the console world 2008-09-01 20:38 which is getting to be more machines than the biz world even 2008-09-01 20:39 more than certain search engine operators even 2008-09-01 20:41 Are: 2008-09-01 20:41 inode.c:421: warning: format ‘%Lx’ expects type ‘long long unsigned int’, but argument 2 has type ‘block_t’ 2008-09-01 20:41 tux3.c:139: warning: format ‘%Li’ expects type ‘long long int’, but argument 2 has type ‘u64’ 2008-09-01 20:41 tux3.c:175: warning: format ‘%Li’ expects type ‘long long int’, but argument 2 has type ‘u64’ 2008-09-01 20:41 supposed to happen? 2008-09-01 20:46 cast the printf arg using (L) 2008-09-01 20:47 that's a tux3 macro, same as (long long unsigned) 2008-09-01 20:47 I'm running 32 bit here, so you need to post your hg patch to the mailing list 2008-09-01 20:48 ACTION has to get his hammer machine online 2008-09-01 20:50 right 2008-09-01 20:51 that's what I thought, so I made the patch already 2008-09-01 20:51 was just waiting on posting it 2008-09-01 20:55 tux3 will need to go GPLv2 at some point to get into the kernel, no? 2008-09-01 21:02 applied, thanks 2008-09-01 21:03 that is correct, there is a post about that 2008-09-01 21:04 I reserve the right to relicense tux3, including the downgrade to v2 for the kernel port 2008-09-01 21:04 I suppose I should ping Eben and see if he likes my slight hack of his license ;-) 2008-09-01 21:16 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-09-01 21:16 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-09-01 21:51 flips: what are the offsets? 2008-09-01 21:51 offsets? 2008-09-01 21:51 oh 2008-09-01 21:51 in dleaf 2008-09-01 21:51 16 bit address within a dleaf 2008-09-01 21:51 looks like there's one per entry 2008-09-01 21:51 ok 2008-09-01 21:51 or 16 bit offset from the beginning of data in an ileaf 2008-09-01 21:52 ileaf would be an easier place to start 2008-09-01 21:52 dleaf has a two level index, both levels upside down, which is a little confusing 2008-09-01 21:52 heh 2008-09-01 21:53 the other confusing detail is that the 0th index entry is not actually represented, it is assumed to be zero 2008-09-01 21:53 for both dleaf and ileaf 2008-09-01 21:54 I think I might be able to code an inline to make that a little clearer 2008-09-01 22:01 another complication for ileaf is that leaf->count is allowed to be zero 2008-09-01 22:02 that means that dick[-i] can be invalid either because i is zero or leaf->count is zero, which usually implies the same, but not for ileaf 2008-09-01 22:02 which allows offsets higher than leaf->count 2008-09-01 22:03 because the ileaf dictionary can be extended to accomodate. 2008-09-01 22:57 how is the size of the zeroth inode an ileaf found? 2008-09-01 22:59 inode of an ileaf* 2008-09-01 23:01 er, if leaf->count is zero 2008-09-01 23:02 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-01 23:07 http://pastie.org/264354.txt <-- does that look about right? 2008-09-01 23:17 er 2008-09-01 23:18 I guess dict[-1] exists even if there's only the zeroth inum 2008-09-01 23:18 therefor dict[-btree->entries_per_leaf - 1] is part of the dict too 2008-09-02 00:06 alright I'm heading to bed 2008-09-02 00:13 sorry, didn't notice your chat 2008-09-02 00:13 if you type "flips" then the tab lights up 2008-09-02 00:15 konrad, your writeup is accurate 2008-09-02 00:16 kongrad, if leaf->count is zero, dict[-1] does not exist either 2008-09-02 00:17 valgrind will complain if you try to pretend it exists ;-) 2008-09-02 00:17 ACTION loves valgrind 2008-09-02 06:27 flips: right 2008-09-02 06:42 sorry for the exploded code 2008-09-02 09:30 it was nice code 2008-09-02 09:30 now it's compressed code ;-) 2008-09-02 09:31 yay 2008-09-02 09:32 now what? 2008-09-02 09:49 more checks? 2008-09-02 09:49 let me see 2008-09-02 09:49 what else can be checked about ileaf 2008-09-02 09:50 could check for unkown attributes 2008-09-02 09:50 or could tackle dleaf, much harder 2008-09-02 09:50 I'll do the former then start on the latter, I guess. Sound good? 2008-09-02 09:51 sounds good 2008-09-02 09:51 excellent 2008-09-02 09:51 attr_check 2008-09-02 09:52 I wonder why mailman fails to post so list posts 2008-09-02 09:53 I see your answer to masoud, but not masoud's post 2008-09-02 09:53 ah 2008-09-02 09:53 ACTION looks for logs 2008-09-02 09:53 he didn't write to list 2008-09-02 09:53 just replied to me 2008-09-02 09:53 ah right 2008-09-02 09:53 so I CC'd the list 2008-09-02 09:53 replying back to list is good 2008-09-02 09:59 ok time for me to start another hack 2008-09-02 09:59 truncate I think it was 2008-09-02 10:04 yep 2008-09-02 10:17 hm, how do I setup an hg username? 2008-09-02 10:18 (and anything else it needs) 2008-09-02 10:20 flips: http://pastie.caboo.se/264630 <-- look ok? 2008-09-02 10:25 konrad, also need to check that the attribute list ends exactly at the size limit 2008-09-02 10:25 and do that without accessing out of bounds 2008-09-02 10:25 slightly tricky 2008-09-02 10:25 the neat thing about an rcs like hg is you don't have to ask permission or have a user name 2008-09-02 10:26 it makes commits to the local repo nicer 2008-09-02 10:26 what do you mean by the list ends at the size limit? 2008-09-02 10:27 the attributes are all variable sizes 2008-09-02 10:28 so you need to do that attr = decode(attr...) thing 2008-09-02 10:28 checking that the resulting pointer is not out of range 2008-09-02 10:28 why not just look up the size and check that? 2008-09-02 10:29 sure 2008-09-02 10:29 which is exactly what decode* does 2008-09-02 10:29 ah 2008-09-02 10:29 the magic numbers 6 and 10 should be replaced by constants, you can add those constants to the enum 2008-09-02 10:30 flips: did you see the post about performance on the zumastor list? 2008-09-02 10:30 k 2008-09-02 10:30 mornin' all 2008-09-02 10:30 have not yet 2008-09-02 10:30 hiyah 2008-09-02 10:30 konrad, which post is that 2008-09-02 10:31 konrad: good work, welcome :) 2008-09-02 10:31 flips: which post is what? 2008-09-02 10:31 shapor: thanks 2008-09-02 10:31 oh 2008-09-02 10:31 I'm not on the sumastor list 2008-09-02 10:31 shapor said it 2008-09-02 10:31 :D 2008-09-02 10:31 let me see 2008-09-02 10:32 flips: Subject: Re: RHEL5 2.6.18 support? 2008-09-02 10:33 yes 2008-09-02 10:33 good post 2008-09-02 10:33 and we have the answer: tux3 + backport to zumastor 2008-09-02 10:34 flips: should attr_check fail if the size of an attr is less than 2, or is that allowed? 2008-09-02 10:34 allowed I think, but there is no attribute with that size 2008-09-02 10:35 right 2008-09-02 10:35 I mean 2 including the header 2008-09-02 10:35 that's a bug 2008-09-02 10:35 which is itself 2 bytes 2008-09-02 10:35 ok, I'll fail if that happens 2008-09-02 10:35 headers are never less that 2 bytes, I don't see changing that 2008-09-02 10:35 we're not quite that insane about compression 2008-09-02 10:35 ok 2008-09-02 10:38 "The long and short of truncate" -- new post coming 2008-09-02 10:39 flips: http://pastie.caboo.se/264644 2008-09-02 10:41 konrad, there can be multiple attributes per leaf entry 2008-09-02 10:41 attr_check should not know about dictionary format at all 2008-09-02 10:41 just take (base, size) 2008-09-02 10:42 hm? 2008-09-02 10:42 to set up a unit test, you need to actually encode some attributes, so this function belongs in iattr.c rather than ileaf.c 2008-09-02 10:42 ah 2008-09-02 10:44 attr_check(void *attrs, unsigned size)? 2008-09-02 10:45 right 2008-09-02 10:45 k 2008-09-02 10:45 would return yes/now I think 2008-09-02 10:45 and the caller would complain 2008-09-02 10:45 maybe 2008-09-02 10:53 hm 2008-09-02 10:53 in encode_attrs() in iattr.c 2008-09-02 10:53 the for loop goes does kind from 0 to 32 2008-09-02 10:53 when kind only gets 4 bits on disk 2008-09-02 10:54 yes, sloppy 2008-09-02 10:54 ;-) 2008-09-02 10:54 :) 2008-09-02 10:54 feel free to improve 2008-09-02 10:54 the reason the lowest attr kind is not zero is, catches more bugs it it isn't 2008-09-02 10:54 right, I saw that earlier 2008-09-02 10:55 I think attr kind zero will only get used when all 15 others are used 2008-09-02 10:55 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-02 10:55 and then it will likely mean "just pad this" 2008-09-02 10:55 heh 2008-09-02 10:55 we might declare at some point that attributes are always padded to even numbers of bytes 2008-09-02 10:56 or we might allow odd numbers 2008-09-02 10:56 then I'd have to recant the above statement about no attr kind less than 2 bytes 2008-09-02 10:56 we'd introduce at least one one byte attr 2008-09-02 10:56 "noop" 2008-09-02 10:56 or pad 2008-09-02 10:57 so that we can update attrs in some cases without moving everything in the leaf 2008-09-02 10:57 future optimization 2008-09-02 10:57 anyway, just to say the design might evolve a little some weeks down the road 2008-09-02 10:58 for now it is a two-byte granularity 2008-09-02 10:58 that means when immediate data attributes get added, they need to be padded out 2008-09-02 10:59 hey maze 2008-09-02 11:01 flips: more like this? http://pastie.caboo.se/264662 2008-09-02 11:01 exactly like that I think 2008-09-02 11:02 hello 2008-09-02 11:02 hey 2008-09-02 11:02 ready for your vfs tutorial? 2008-09-02 11:04 <- maze 2008-09-02 11:04 right now? no, sorry, I'm in the middle of a big turnup which is already behind schedule 2008-09-02 11:04 konrad, you can use the new enum you just declared for both the lower and upper limit of the encode loop 2008-09-02 11:04 no I meant in general 2008-09-02 11:05 it's going to be a long tutorial ;-) 2008-09-02 11:05 ah ok 2008-09-02 11:05 period of 3 weeks I'd think 2008-09-02 11:05 in general? haven't really had the time :-( to do much - as in almost none. 2008-09-02 11:05 I need a vacation... 2008-09-02 11:05 at the end you get the "phillips certificate of vfs competency" 2008-09-02 11:05 and the right to flame newbies on lkml 2008-09-02 11:05 well worth having 2008-09-02 11:06 cool ;-) I'd love to. 2008-09-02 11:06 ACTION listens carefully and hears the sound of google data centers burning down 2008-09-02 11:06 should be able to run a class of two, shapor is about ready for this 2008-09-02 11:07 konrad too I think 2008-09-02 11:07 I'll listen in certainly 2008-09-02 11:07 they're not burning too quickly at least ;-) 2008-09-02 11:08 maze, did you notice your comments on that fat key space were highly relevant? 2008-09-02 11:08 hammer essentially implements what you suggested 2008-09-02 11:08 so the idea is far from useless 2008-09-02 11:08 I'm not sure what you're referring to ;-) fat key space? 2008-09-02 11:09 your "beautiful idea" you had afterthe initial tux3 whiteboarding 2008-09-02 11:09 to incorporate the file offset in the btree key 2008-09-02 11:09 hammer does that 2008-09-02 11:09 ok, right that one 2008-09-02 11:09 just like you imagined 2008-09-02 11:09 I did rather like that one 2008-09-02 11:09 tux3 does not, that is the main difference between them, and the allocation method 2008-09-02 11:10 it makes a beautifully simple design 2008-09-02 11:10 exactly... so why doesn't tux3 use it? 2008-09-02 11:10 but my guess is, tux3 will end up faster as it is more cache efficient to have a two level tree 2008-09-02 11:10 the number of probes is the same 2008-09-02 11:11 hmm, I really htink it should work better with just one 2008-09-02 11:11 not probes, but btree compares 2008-09-02 11:11 I ran the numbers in detail 2008-09-02 11:11 hmm, really? interesting. 2008-09-02 11:11 having a single tree means a deeper tree 2008-09-02 11:11 it works out exactly 2008-09-02 11:11 true 2008-09-02 11:11 log(something) either way 2008-09-02 11:11 and it also probably means less children per node because of larger keys... 2008-09-02 11:12 it does 2008-09-02 11:12 yes, but... 2008-09-02 11:12 it should spread out better over the entire filesystem 2008-09-02 11:12 hammer: 64 tux3: 256 or 512 2008-09-02 11:12 so instead of access being log(# files) + log(size of file) 2008-09-02 11:12 tux3: sometimes 384 2008-09-02 11:12 you have access being something like (log used disk space) 2008-09-02 11:13 - extents 2008-09-02 11:13 in tux3? 2008-09-02 11:13 no comparison of fat vs thin btrees 2008-09-02 11:13 it's mainly log(inode table size) in tux3 2008-09-02 11:14 and the inodes are cached 2008-09-02 11:14 so that disappears mostly 2008-09-02 11:14 leaving the nice little per-file btrees 2008-09-02 11:14 so I guess the two level approach will significantly outperform, just a constant factor but a big one 2008-09-02 11:14 I just want the metadata on a different disk/ram/flash-backed/etc ;-) 2008-09-02 11:15 ah, that is coming 2008-09-02 11:15 my answer to zfs's mess 2008-09-02 11:15 is a rather nice hack 2008-09-02 11:15 I really really like forward logging 2008-09-02 11:15 involving having tux3 work together with lvm3 2008-09-02 11:15 although I hate the fact there's that 0.0001% chance of it breaking 2008-09-02 11:15 the forward logging thing is working out design wise, I should incorporate it into the userspace prototype now 2008-09-02 11:16 the chance is nowhere that big 2008-09-02 11:16 its completely under our control 2008-09-02 11:16 and we will have an option to disable it completely, just for the ultraparanoid 2008-09-02 11:16 the really trick there is to use a sufficiently paranoid checksumming signature 2008-09-02 11:16 with the "phase" commit philosophy, it will still be efficient even without relying on a hash 2008-09-02 11:17 right 2008-09-02 11:17 and even then - it can only fail on non-clean remount 2008-09-02 11:17 and the checksum can be avoided completely in significant cases 2008-09-02 11:17 right again 2008-09-02 11:17 which should be rare... so the chance of failure should be as close to '0' as can be while still theoretically possible 2008-09-02 11:18 so, the really nice thing is, when you have a whole bunch of transactions ready to commit, the forward logging can be done without any hash: wait for transaction completions, then mark complete in a known location 2008-09-02 11:18 that will be the ultra paranoid option 2008-09-02 11:19 I would think a 64 bit decent hash would get close to 0 error chance 2008-09-02 11:19 calculating the hashes should be cheap 2008-09-02 11:19 maybe make that configurable 2008-09-02 11:19 yes 2008-09-02 11:19 so long as its not m5 2008-09-02 11:19 we're not talking about calculating the hash from a large amount of data 2008-09-02 11:19 md5 2008-09-02 11:19 or something like that 2008-09-02 11:20 even if it's md5, it's still fast, because it's much-much faster than a disk seek 2008-09-02 11:20 zfs and btrfs find its a significant cost if they checksum everything 2008-09-02 11:20 and you'd be hashing something like 256 bytes or so 2008-09-02 11:20 about the biggest bottleneck in fact 2008-09-02 11:20 its really important to have an efficient hash 2008-09-02 11:21 oh, no, I thought of it as literally a block signature for the superblock 2008-09-02 11:21 not for everything else 2008-09-02 11:32 right 2008-09-02 11:33 I think, just checksum all the _used_ data in the commit block and part of the data blocks 2008-09-02 11:33 right, that'd be nice - use something like a crc32 (cpu support) for that - maybe two crc32's in parallel (or a crc64 if sse will support that) 2008-09-02 11:34 it can be an option whether we rely on the checksum to know that the data part of the transaction got onto media, or wait for completion on data before submitting the commit block 2008-09-02 11:34 cpu support for crs32? 2008-09-02 11:34 but I was thinkning of each block in the forward log and the superblock having a sort of tail signature which would look kind of like 2008-09-02 11:34 I don't know that instruction ;-) 2008-09-02 11:34 crc32 - yeap, coming in sse4.1 or so 2008-09-02 11:34 oh, that's too bad 2008-09-02 11:34 should be out in nehalem or even earlier 2008-09-02 11:34 crc32 sucks 2008-09-02 11:34 for hashing 2008-09-02 11:34 well.... 2008-09-02 11:35 you should be able to do a crc32*4 easily enough 2008-09-02 11:35 I hope it's not crc32 specific 2008-09-02 11:35 it is 2008-09-02 11:35 still not good 2008-09-02 11:35 crc32 has funnels 2008-09-02 11:35 lots of them 2008-09-02 11:35 yes, well... 2008-09-02 11:35 bleah 2008-09-02 11:35 ACTION hates intel 2008-09-02 11:35 I wish they supported md5/sha1 and aes in the cpu 2008-09-02 11:35 it's a powerful argument for using a substandard hash 2008-09-02 11:36 double bleah 2008-09-02 11:36 SSE4.2 Instruction Description CRC32 Accumulate CRC32C value using the polynomial 0x11EDC6F41 (or, without the high order bit, 0x1EDC6F41).[5] 2008-09-02 11:37 Nehalem and on, so next year 2008-09-02 11:37 I'll ping a mathematician to analyze it 2008-09-02 11:38 see if we can make something useful out of that turd 2008-09-02 11:38 I'd hate to incorporate crc32 into tux3 on-disk format just because intel farted 2008-09-02 11:38 we'll see what amd comes up with 2008-09-02 11:38 right 2008-09-02 11:38 in fact 2008-09-02 11:38 I know who to talk to about that 2008-09-02 11:39 amd is about to go intel one better 2008-09-02 11:39 lol, how? 2008-09-02 11:39 and I'd be happy to run a few cycles slower on intel just to force intel to do it right 2008-09-02 11:39 heh 2008-09-02 11:39 sekrit 2008-09-02 11:39 ok I need to get my mathematical ducks in a row for this 2008-09-02 11:40 AMD claims SSE5 will provide dramatic performance improvements, particularly in high performance computing (HPC), multimedia and computer security applications, including a 5x performance gain for Advanced Encryption Standard (AES) encryption and a 30% performance gain for discrete cosine transform (DCT) used to process video streams.[1] 2008-09-02 11:40 that's more like it 2008-09-02 11:40 I'll go get into the nda loop there 2008-09-02 11:40 AMD's) SSE5 does not include all (Intel's) SSE4 instructions. In other words, it is not a superset of SSE4 but a competitor to it. Likewise, Intels pre-Nehalem cores contain only a partial implementation of SSE4, called SSE4.1. This poses some difficulty and extra work for compilers and assembly-level hand tuning of code 2008-09-02 11:40 make sure amd is a tux-ready machine 2008-09-02 11:43 SSE5 includes: 2008-09-02 11:43 Fused multiply-accumulate (FMACxx) instructions Integer multiply-accumulate (IMAC, IMADC) instructions Permutation (PPERM, PERMPx) and conditional move (PCMOV) instructions Precision control, rounding, and conversion instructions 2008-09-02 11:44 note the permutation stuff 2008-09-02 11:44 probably what gives the aes boost 2008-09-02 11:44 should be useable for hash/crypt stuff as well 2008-09-02 11:45 noted 2008-09-02 11:45 that's the right way to do it 2008-09-02 11:45 it's perfect 2008-09-02 11:45 amd rulez 2008-09-02 11:45 intel suckorz 2008-09-02 11:45 sukzorz 2008-09-02 11:55 the fused multiply will also mean a huge amount for flops freaks everywhere ;-) 2008-09-02 11:56 ie. anybody doing anything high-precision 2008-09-02 11:58 :-) 2008-09-02 11:58 ACTION is a flops freak 2008-09-02 12:06 oh, weird, wonder how I managed to do that 2008-09-02 12:07 yes, odd 2008-09-02 12:07 indeed 2008-09-02 12:08 edit without compile most probably 2008-09-02 12:08 thought I did compile though 2008-09-02 12:08 odd 2008-09-02 12:12 scamjet time 2008-09-02 12:16 konrad, I tghi 2008-09-02 12:16 konrad, I think you have ileaf under control 2008-09-02 12:16 dleaf is 10x harder ;-) 2008-09-02 12:16 maybe 100x 2008-09-02 12:17 heh 2008-09-02 12:17 I suggest shapor for code review on that 2008-09-02 12:17 ok 2008-09-02 12:18 ACTION runs and hides 2008-09-02 12:19 not quick enough 2008-09-02 12:25 tux3 is the... 6th google result for tux3 2008-09-02 12:26 I get first 10 2008-09-02 12:27 bbl 2008-09-02 12:28 'tux 3' 2008-09-02 12:33 interesting 2008-09-02 12:33 http://pastie.caboo.se/264727 <-- building tux3 on my ppc machine 2008-09-02 12:42 comes from trace.h 2008-09-02 12:43 flips: ping 2008-09-02 13:58 hey 2008-09-02 14:32 konrad, pong 2008-09-02 14:32 why is there an asm("int3") in trace.h? 2008-09-02 14:37 it generates a trap into gcc on assert failure 2008-09-02 14:37 really useful 2008-09-02 14:37 sorry 2008-09-02 14:37 doesn't work on non-x86 2008-09-02 14:37 :( 2008-09-02 14:37 trap into gdb 2008-09-02 14:37 just comment it out 2008-09-02 14:38 did 2008-09-02 14:38 and hunt around for something that does work 2008-09-02 14:38 it's really useful 2008-09-02 14:38 you can put "b break" into your gdb .rc 2008-09-02 14:38 and void break(void) { } 2008-09-02 14:38 called from assert 2008-09-02 14:43 konrad, what non-x86 do you run on? 2008-09-02 14:43 ppc 2008-09-02 14:43 mac? 2008-09-02 14:44 ibook 2008-09-02 14:44 cool 2008-09-02 14:44 perfect for checking endian issues 2008-09-02 14:44 and wordsize 2008-09-02 14:44 yep 2008-09-02 14:44 all of ileaf and dleaf have to be converted for endian at some point 2008-09-02 14:44 not right away 2008-09-02 14:53 ACTION is back from Burning Man 2008-09-02 14:53 I feel great 2008-09-02 15:00 me too 2008-09-02 15:01 by the way, what is it that makes you feel great? (only the legal part please) 2008-09-02 15:03 I love this uniden phone system 2008-09-02 15:03 got the 8 series corded base station about 4 years ago 2008-09-02 15:03 its still the best home phone system on the planet 2008-09-02 15:04 just got two new handsets for it, the upgraded 905 series work fine 2008-09-02 15:04 and they're better than the original handsets 2008-09-02 15:04 almost like cell phones 2008-09-02 15:08 flips: I don't do drugs as a rule 2008-09-02 15:08 never really did 2008-09-02 15:08 hard to explain, it's just the overall intensity of the experience 2008-09-02 15:08 like a rage? 2008-09-02 15:09 having such community orientied people really disarms the typical resistence you'd might have dealing with people in a city 2008-09-02 15:09 ah, people not being aholes 2008-09-02 15:09 I get it 2008-09-02 15:09 that's a medium for other things, art, partying, etc... 2008-09-02 15:09 even aholes pretending not to be 2008-09-02 15:09 you'd like it 2008-09-02 15:09 I know I would 2008-09-02 15:09 it's like everthing wrong with US society reversed. 2008-09-02 15:09 kids not compatible I'd think 2008-09-02 15:10 no, folks bring their kids 2008-09-02 15:10 ah 2008-09-02 15:10 then next year for sure 2008-09-02 15:10 it's not a big deal, just avoid certain camps and you're set 2008-09-02 15:10 certain camps where... what? is happening 2008-09-02 15:10 they aren't exhibiting that stuff openly anyways, so it's no big deal 2008-09-02 15:10 death yoga? 2008-09-02 15:10 porn & eggs 2008-09-02 15:10 spike's 2008-09-02 15:10 stuff like that 2008-09-02 15:10 ic 2008-09-02 15:11 right 2008-09-02 15:12 not any worse than a goth festival I'd think 2008-09-02 15:13 you'd like that 2008-09-02 15:13 german version 2008-09-02 15:13 not really 2008-09-02 15:13 I guarantee it 2008-09-02 15:13 for one thing, there's a high concentration of ubergeeks 2008-09-02 15:14 yeah, your infrastructure engineering group is out there for sure, Tim Hockin 2008-09-02 15:14 the death guild camp is full f nerds as well 2008-09-02 15:14 larry & sergey even 2008-09-02 15:17 handset #4 now online, my home pbx is good for another 2 years 2008-09-02 15:18 going to celebrate with some french roast 2008-09-02 15:20 how's tux3 going ? 2008-09-02 15:20 any of my suggestions been thought about futher ? 2008-09-02 15:20 further ? 2008-09-02 15:20 oh yes 2008-09-02 15:21 I'm getting ready to set up a nice environment for you to develop the locking ;-) 2008-09-02 15:21 you'll see growth of the project with more folks joining when you get more stuff working 2008-09-02 15:21 oh shit 2008-09-02 15:21 that's true 2008-09-02 15:21 it's already happening 2008-09-02 15:21 good 2008-09-02 15:21 major stuff now works, see tux3.c 2008-09-02 15:21 yeah, because I don't have faith in Linux file systems after seeing a bunch of NetApp code 2008-09-02 15:21 can create and read/write a tux3 volume from shell commands now 2008-09-02 15:22 nice 2008-09-02 15:22 really did make a 64 petabyte file in an 8k volume image 2008-09-02 15:22 that's with 4K spare for the boot loader 2008-09-02 15:23 decided to make the tux3 superblock 1K just to have that work out ;-) 2008-09-02 15:23 that leaves 12 256 byte blocks for the filesystem structure, root directory, bitmaps, inode table 2008-09-02 15:23 is this all you're doing at Google right now ? 2008-09-02 15:23 you could say that 2008-09-02 15:24 but its actually part time 2008-09-02 15:24 you should see me when I work ;-) 2008-09-02 15:29 "I don't like the flashing red light in the upper left hand corner of each handset. This is a charge indicator that lets you know the phone is charged and ready to go. There is nothing wrong with letting consumers know this, but to have a light that continuously flashes can be a tremendous distraction." -- amazon idiot who doesn't know he owns a digital answering machine 2008-09-02 15:36 maybe I will pthread tux3 before doing delete 2008-09-02 15:36 just for bh 2008-09-02 15:40 flips: how fine-grained are you planning on going with locking? 2008-09-02 15:40 very 2008-09-02 15:40 ask bh ;-) 2008-09-02 15:40 leaf? 2008-09-02 15:40 yes 2008-09-02 15:40 hrm will you do that in the generic btree code? 2008-09-02 15:40 yes 2008-09-02 15:40 with the help of pthreads 2008-09-02 15:40 and futexes 2008-09-02 15:41 where are you planning on storing the locks? 2008-09-02 15:41 bh is going to have fun with it ;-) 2008-09-02 15:41 in the buffer heads 2008-09-02 15:41 or in a hash 2008-09-02 15:41 it's in flux 2008-09-02 15:41 either would work in kernel 2008-09-02 15:41 so i'm guessing locks in the intermediate nodes as well? 2008-09-02 15:42 for merge/split 2008-09-02 15:42 yes 2008-09-02 15:42 careful about deadlocks there 2008-09-02 15:42 all down the chain 2008-09-02 15:42 always 2008-09-02 15:42 anybody who thinkgs abba is a swedish pop group is not touching the locking code 2008-09-02 15:42 lol 2008-09-02 15:43 bh knows that stuff I'm pretty sure 2008-09-02 15:43 didn't ask, but what he talks about is beyond that 2008-09-02 15:45 hrm how about transactional stuff 2008-09-02 15:46 like where you have to create an inode, then reference from a directory 2008-09-02 15:46 which involves more than one tree 2008-09-02 15:47 I'll write it up in a few days 2008-09-02 15:47 it's pretty much all there in the hammer thread 2008-09-02 15:47 we track every time a buffer gets dirty 2008-09-02 15:47 i still haven't had the time to digest that whole brain dump 2008-09-02 15:48 then etiher add it to the current transaction phase or cow the buffer 2008-09-02 15:48 it's basically the phase part of phase tree, the part that netapp never tried to own 2008-09-02 15:50 cowing the buffer is a simple matter of setting its index to some other physical block 2008-09-02 15:50 or in that case of a file blocks, changing the pointer in its parent 2008-09-02 15:50 index block 2008-09-02 15:50 which is done only in cache 2008-09-02 15:50 not on disk 2008-09-02 15:51 so you have one view of the vs on disk, and another, current one that the vfs sees, in memory 2008-09-02 15:51 of the fs I mean 2008-09-02 15:51 when you get the aha on that it's going to be fun 2008-09-02 15:52 I think I'll use the term "fork" instead of cow 2008-09-02 15:53 it's much more descriptive of what happens 2008-09-02 15:53 so tux3's transaction model is to fork any buffer written to after a phase as closed 2008-09-02 15:53 if the phase is still open, just write to it normally 2008-09-02 15:54 unspeakably efficient 2008-09-02 15:54 tux3 has exactly two ways of getting info onto media 1) write to a buffer 2) save the superblock 2008-09-02 15:54 there will eventually be 3) directio 2008-09-02 15:55 which will require more fiddling 2008-09-02 15:56 I wonder if it would be worth the very minor regularity improvement to hold the superblock in a buffer 2008-09-02 15:56 well 2008-09-02 15:56 kind of dumb 2008-09-02 15:56 you don't know the block size for the superblock 2008-09-02 15:56 or 2008-09-02 15:56 more accurately, the blocksize of the superblock may not match the buffer cache blocksize 2008-09-02 15:57 or the filesystem blocksize 2008-09-02 15:57 both making it unnatural to force the sb into a buffer 2008-09-02 15:58 I think we may be studly and to the initial sb load and later saves directly via the bio interface 2008-09-02 15:58 which means we need to handle completion, get the interrupt back into foreground 2008-09-02 15:58 interrupt completion that is 2008-09-02 15:59 which we need to do anyway if we want to avoid the decrepit old block io library 2008-09-02 16:06 http://interviews.slashdot.org/comments.pl?sid=950917&cid=24845533 2008-09-02 16:20 all the remaining conditional exprs in ileaf.c involve leaf->count, there has to be a way to make a macro 2008-09-02 16:20 macroizing those will be a big help in easing the pain of endian conversion 2008-09-02 16:25 ACTION picks up konrad's cute negative for loop for dleaf_trunc 2008-09-02 16:25 I think I grabbed it from somewhere in ileaf.c 2008-09-02 16:25 really? 2008-09-02 16:25 or maybe that was my imagination 2008-09-02 16:25 yeah 2008-09-02 16:25 looks original 2008-09-02 16:26 I had something remotely like it 2008-09-02 16:26 but yours is actually readable 2008-09-02 16:26 ileaf->dump 2008-09-02 16:26 er 2008-09-02 16:26 ileaf_dump 2008-09-02 16:26 same thing 2008-09-02 16:26 roughly 2008-09-02 16:26 oh heh 2008-09-02 16:26 I forget some of the stuff I write :-) 2008-09-02 16:26 :D 2008-09-02 16:27 yours is better 2008-09-02 16:27 it's how I should have written it 2008-09-02 16:27 I'll change ileaf_dump to match, or do you want to do that? 2008-09-02 16:27 go ahead 2008-09-02 16:27 I'm attempting to wrap my head around dleaf 2008-09-02 16:27 good 2008-09-02 16:27 don't bother with ilead :-) 2008-09-02 16:27 dleaf is pure braindamange, ask shapor 2008-09-02 16:28 of the good kind 2008-09-02 16:28 it will make your head hurt 2008-09-02 16:28 heh 2008-09-02 16:32 u16 *gdict = (void *)leaf + btree->sb->blocksize; 2008-09-02 16:32 u16 *edict = (void *)(gdict - leaf->groups); 2008-09-02 16:32 more regular form 2008-09-02 16:32 plus a cute varname 2008-09-02 16:34 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-02 16:35 hey tim_dimm 2008-09-02 16:35 hey flips 2008-09-02 16:35 you can shell in any time ;-) 2008-09-02 16:35 no time to be stealthy, huh 2008-09-02 16:35 that's stealthy 2008-09-02 16:35 yup 2008-09-02 16:36 flips: shouldn't those be u32? 2008-09-02 16:36 dcc still doesn't work 2008-09-02 16:36 nat issue 2008-09-02 16:36 konrad, which? 2008-09-02 16:36 [16:32:04] u16 *gdict = (void *)leaf + btree->sb->blocksize; 2008-09-02 16:36 [16:32:04] u16 *edict = (void *)(gdict - leaf->groups); 2008-09-02 16:36 um 2008-09-02 16:36 oh yes 2008-09-02 16:37 shows I cut n pasted 2008-09-02 16:37 without engaging brain 2008-09-02 16:37 heh 2008-09-02 16:37 I'm gonna borrow those 2008-09-02 16:37 they are actually struct something * 2008-09-02 16:37 well yeah 2008-09-02 16:37 but u32 is the same size as said struct 2008-09-02 16:38 struct entry and struct group I think 2008-09-02 16:38 right, a dangerous coindicdence 2008-09-02 16:38 but safe in this case 2008-09-02 16:39 the cutest thing about dleaf is the way the entry offset is incremented inthe lookup loop 2008-09-02 16:39 that is where the brain hurt gets serious 2008-09-02 16:40 the for loops using struct pointers are gratuitous 2008-09-02 16:40 flips: remember to think about the allocator so that related bits of metadata are located closely to each other, this is very important for online disk checking 2008-09-02 16:40 it's more clear using array indices 2008-09-02 16:40 and the complier can optimize to the same thing in theory 2008-09-02 16:40 practice is different of course ;-) 2008-09-02 16:41 the ext3 paper, OLS 2007 ?, might be of interest here, they made a modification to ext3 so that fsck would runs much faster 2008-09-02 16:41 bh, you haven't been reading the recent posts ;-) 2008-09-02 16:41 talke about that very thing 2008-09-02 16:41 gcc -O9999999 linux.c 2008-09-02 16:41 did you get to read the paper btw ? 2008-09-02 16:41 :-) 2008-09-02 16:41 bh, I'm one of the stars in it ;-) 2008-09-02 16:41 yeah, I'm overloaded with -rt work right now, first day back 2008-09-02 16:41 downloaded it last week 2008-09-02 16:41 folks are hitting me up for stuff already 2008-09-02 16:42 oh really ? 2008-09-02 16:42 url ? 2008-09-02 16:42 yep 2008-09-02 16:42 um 2008-09-02 16:42 http://ext2.sourceforge.net/2005-ols/2005-ext3-paper.pdf 2008-09-02 16:43 getting close to sk8 oclock 2008-09-02 16:44 ACTION does another piece of 75% cacao chocolate 2008-09-02 16:44 tim_dimm might have a skate left in him 2008-09-02 16:45 went through 30 wheels in 4 days at Maryhill 2008-09-02 16:45 I'll be on the strand by 5:30 2008-09-02 16:45 about 2008-09-02 16:45 wow 2008-09-02 16:45 http://www.silverfishlongboarding.com/option,com_gallery2/Itemid,53/?g2_itemId=237609/ 2008-09-02 16:45 "vintage plantation" <- I highly recommend this chocolate 2008-09-02 16:46 i'm the one *not* in the rubber suit 2008-09-02 16:46 flips: there might be a newer paper on the matter from IBM 2008-09-02 16:46 beautiful 2008-09-02 16:46 bh, link? 2008-09-02 16:46 OLS 2007 or something like that 2008-09-02 16:46 I'll need a better hint 2008-09-02 16:47 tim, you're the one who looks cool 2008-09-02 16:47 except you need a mirrored helment 2008-09-02 16:47 helmet 2008-09-02 16:48 if you kept your elbows in I bet you woulda won 2008-09-02 16:48 and spray some pam on that jacket 2008-09-02 16:49 ACTION isn't very keen on online disk fragmention either in ext4 2008-09-02 16:49 seems kind of like bottom scrapping to me 2008-09-02 16:50 I was trying to grab some air at that point. Those guys just passed me, and I knew they were about to slam on the brakes. 2008-09-02 16:51 variable metadata is useful for homogenous file types like media files, hmmm, interesting 2008-09-02 17:05 bh, you really need to read my musings 2008-09-02 17:05 let me see if I can find a subject line 2008-09-02 17:06 scott on the right? 2008-09-02 17:06 in blue, yes 2008-09-02 17:06 how'd I guess ;) 2008-09-02 17:06 f'n magic 2008-09-02 17:06 shinyness 2008-09-02 17:07 serious about the pam 2008-09-02 17:07 slickness 2008-09-02 17:07 on the list ? 2008-09-02 17:07 should be 2008-09-02 17:07 yes 2008-09-02 17:07 I look at it a bit, but I didn't see very much 2008-09-02 17:07 or lkml ? 2008-09-02 17:07 the list 2008-09-02 17:08 tux3 ? 2008-09-02 17:08 "Spacial correlation between directory entries, inodes and file data" 2008-09-02 17:08 you have to read between the lines 2008-09-02 17:08 all I see is stuff about patches 2008-09-02 17:08 I have a followup post in the works 2008-09-02 17:08 but there is stuff ahead of it 2008-09-02 17:08 in the queue 2008-09-02 17:08 flips: how does the magic zero entry worth with the dleaf dicts? 2008-09-02 17:08 or is it present? 2008-09-02 17:09 konrad, same way 2008-09-02 17:09 0th entry is implied 2008-09-02 17:09 dict should be positioned one past the top of the list 2008-09-02 17:09 flips: you should make online disk checking the default mechanism for your file system, create a common fsck library to shared between the online checker and offline 2008-09-02 17:09 that is violated in dleaf.c sometimes for no good reason 2008-09-02 17:09 just because we were figuring out how to do it at the time 2008-09-02 17:09 offline checking would be used only in a dev situation 2008-09-02 17:09 bh, planned 2008-09-02 17:09 indeed 2008-09-02 17:09 good 2008-09-02 17:10 need to write a tech note 2008-09-02 17:10 ah ok 2008-09-02 17:10 the tux3 userspace implementation is in fact the base of the online tools 2008-09-02 17:10 because until we get reverse pointers and supporting stuff for file systems that's the only things that's going to work 2008-09-02 17:10 including defrag 2008-09-02 17:10 online and offline 2008-09-02 17:10 volume are getting so large that .... you know... 2008-09-02 17:11 reverse pointers is planned, tech note needed 2008-09-02 17:11 I've mentioned some details from time to time 2008-09-02 17:11 I know 2008-09-02 17:11 it's already broken 2008-09-02 17:11 broke years ago 2008-09-02 17:11 tux3 is going to be allocation groups as well 2008-09-02 17:11 and maybe... not sure about it... relative pointers 2008-09-02 17:12 maybe that is tux3.1 2008-09-02 17:12 don't know, it's too experimental 2008-09-02 17:12 right 2008-09-02 17:12 scary 2008-09-02 17:12 get the basics as much as you can first, format changes are another matter 2008-09-02 17:12 http://pastie.caboo.se/264894 2008-09-02 17:12 like that 2008-09-02 17:12 my dumper "from scratch" if you will 2008-09-02 17:12 that's the plan 2008-09-02 17:12 so I think I'm doing something right 2008-09-02 17:13 konrad, kool 2008-09-02 17:13 oh yes 2008-09-02 17:14 if I go out for a skate, your new dumper will be finished when I get back and I can use it 2008-09-02 17:14 hm? 2008-09-02 17:14 it's sort of redundant to the existing dleaf_dump 2008-09-02 17:14 I just wanted to be sure I understand how to loop through the groups 2008-09-02 17:14 er, entries 2008-09-02 17:14 and groups 2008-09-02 17:15 yours is going to be better, I like to backport like that 2008-09-02 17:15 it's called evolution 2008-09-02 17:17 should I make the output look like the old one? 2008-09-02 17:17 good place to start 2008-09-02 17:17 k time to get rolling 2008-09-02 17:19 what's the purpose of (struct entry*)foo->limit ? 2008-09-02 17:19 flips: tying up some loose ends. I'll be out by 6 2008-09-02 17:21 ok, I'll slow down a little 2008-09-02 17:21 see you at the skate park? 2008-09-02 17:21 sure 2008-09-02 17:22 I'll do slaloms at the pier for a while ;-)_ 2008-09-02 17:22 more fun than slowing down 2008-09-02 17:28 flips: stuff posted today on lkml ? 2008-09-02 17:28 bh, not today 2008-09-02 17:28 soon 2008-09-02 17:28 oh ok, so you haven't posted this yet then 2008-09-02 17:29 mainly just on the current state of the disk format 2008-09-02 17:29 ok 2008-09-02 17:29 "Spacial correlation between directory entries, inodes and file data" 2008-09-02 17:29 (read between the lines) 2008-09-02 17:29 it's working out well as far as it goes 2008-09-02 17:29 there's a lot more detail coming on that 2008-09-02 17:30 read the hint about generating functions 2008-09-02 17:30 spatial 2008-09-02 17:30 I've blabbed about that to you personally, but I don't know if it registered yet 2008-09-02 17:30 right 2008-09-02 17:30 spacial is my new word ;-) 2008-09-02 17:30 I googled for that and go nothing useful 2008-09-02 17:30 it's on the tux3 list 2008-09-02 17:31 I totally don't see it 2008-09-02 17:31 it's just patch discussion that I'm seeing 2008-09-02 17:31 you're right 2008-09-02 17:31 google is damaged or mailman 2008-09-02 17:33 http://tux3.org/pipermail/tux3/2008-August/000083.html 2008-09-02 17:33 google is braindamaged 2008-09-02 17:33 :-p 2008-09-02 17:33 later... 2008-09-02 17:44 ACTION reading 2008-09-02 17:53 ok, dumper2 worsk 2008-09-02 18:45 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-09-02 18:45 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-09-02 19:35 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-09-02 19:36 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-09-02 19:50 konrad, kool 2008-09-02 19:50 I'm not sure it's any more clear than the original 2008-09-02 19:51 it's 3 lines shorter, but that doesn't mean clearer 2008-09-02 20:01 it would be hard to be less clear than the original 2008-09-02 20:03 eh, it's not an easy process 2008-09-02 20:04 -easy +simple 2008-09-02 20:07 it's going to get even less easy when we add in versioned extents 2008-09-02 20:07 to the same code 2008-09-02 20:07 so it has to be clean 2008-09-02 20:12 mhm 2008-09-02 20:12 well, mine uses less pointer arithmetic and more array notation 2008-09-02 20:13 *(a + b) vs a[b] 2008-09-02 20:15 I think that's better 2008-09-02 20:15 for the dumper 2008-09-02 20:15 easier to read 2008-09-02 20:15 can save the pointer tricks for something that matters 2008-09-02 20:15 assuming the compiler can't optimize that well 2008-09-02 20:16 which is not a safe assumption 2008-09-02 20:18 flips: http://pastie.caboo.se/264975 there's a first poke at it 2008-09-02 20:19 not sure I like the ent == -1 logic 2008-09-02 20:20 otherwise looks pretty good 2008-09-02 20:20 offset -= doesn't look right 2008-09-02 20:20 should be += 2008-09-02 20:20 wow, dleaf_isinorder looks nice 2008-09-02 20:21 ent == -1 is the same as the check against the magical zero 2008-09-02 20:21 right 2008-09-02 20:21 ent < -1 ? 2008-09-02 20:21 something seems off by one 2008-09-02 20:21 hm? 2008-09-02 20:22 as in, for every entry but the first entry, check against the previous entry 2008-09-02 20:22 no? 2008-09-02 20:22 how can ent be < -1 ? 2008-09-02 20:22 ent grows smaller? 2008-09-02 20:22 -1, -2, -3 2008-09-02 20:22 oh right :-) 2008-09-02 20:22 (-3 < -1) 2008-09-02 20:22 upside down 2008-09-02 20:22 => true 2008-09-02 20:22 yeah :) 2008-09-02 20:22 braindamage 2008-09-02 20:22 sory 2008-09-02 20:22 same with offset -= 2008-09-02 20:22 offset is negative 2008-09-02 20:22 grows smaller 2008-09-02 20:22 should be ent < 0 though 2008-09-02 20:22 let's see what the skew is 2008-09-02 20:23 hmm 2008-09-02 20:23 maybe it's my brain 2008-09-02 20:23 ah 2008-09-02 20:23 I see 2008-09-02 20:23 I'm assuming the -1th element is greater than zero 2008-09-02 20:23 which isn't good 2008-09-02 20:23 true 2008-09-02 20:23 I think you want a structure where you assign a variable once outside the loop 2008-09-02 20:24 like you had in ileaf 2008-09-02 20:24 alright 2008-09-02 20:24 it's not bad 2008-09-02 20:24 but given how important it is, it should be crystalline 2008-09-02 20:24 yes 2008-09-02 20:25 ok, I can read it now 2008-09-02 20:25 you're right 2008-09-02 20:25 you need to get the correctness of the first value 2008-09-02 20:25 then induce correctness from there 2008-09-02 20:26 and all the little bits have to join together 2008-09-02 20:26 so you need to init your first value outside both loops 2008-09-02 20:26 you can safely init it to zeor 2008-09-02 20:27 zero 2008-09-02 20:27 since keys are unsigned 2008-09-02 20:27 flips: http://pastie.caboo.se/264977 2008-09-02 20:27 oh, keys are unsigned? 2008-09-02 20:28 not crystalline yet ;-) 2008-09-02 20:28 keys are 2008-09-02 20:28 u64 2008-09-02 20:28 see tuxkey_t 2008-09-02 20:28 wait 2008-09-02 20:28 but I'm testing limits 2008-09-02 20:28 not keys 2008-09-02 20:28 they're keys 2008-09-02 20:28 limits are u8 2008-09-02 20:28 that's what dleaf is 2008-09-02 20:28 a key dict 2008-09-02 20:28 right 2008-09-02 20:29 you have to expand those u8s to keys 2008-09-02 20:29 that's the clever thing here 2008-09-02 20:29 ah, I'm just doing what I did in ileaf_isinorder 2008-09-02 20:29 making sure the limits are non-descending 2008-09-02 20:29 when you get the aha it's going to be a big one ;-) 2008-09-02 20:29 which is still important 2008-09-02 20:29 we have two 48 bit fields we combine to make a key 2008-09-02 20:29 two 24 bit fields 2008-09-02 20:29 and the 8 bit fields are just indexes to allow us to do that 2008-09-02 20:29 to make a 48 bit key 2008-09-02 20:29 right 2008-09-02 20:30 sorry 2008-09-02 20:30 so you need to assemble the 48 bit key at each step and compare to prevkey 2008-09-02 20:30 ah, and it should be greater always? 2008-09-02 20:30 yes 2008-09-02 20:30 non-descending or ascending? 2008-09-02 20:30 the 8 bit fields within groups also ascend 2008-09-02 20:30 can two keys be the same? 2008-09-02 20:30 which is what your code checks 2008-09-02 20:30 right 2008-09-02 20:30 which is also good 2008-09-02 20:31 alright 2008-09-02 20:31 twe keys can be the same 2008-09-02 20:31 that's going to be critically important 2008-09-02 20:31 nondescending 2008-09-02 20:32 assembling the 48 bit key is pretty easy 2008-09-02 20:32 it's just 24 bits from the entry and the other 24 bits from the group that owns it 2008-09-02 20:32 computing the offset is a little trickier 2008-09-02 20:32 offset into data 2008-09-02 20:35 http://pastie.caboo.se/264980 checking offsets within groups and non-descending keys now 2008-09-02 20:36 it triggers 3 times running ./dleaf 2008-09-02 20:36 triggers? 2008-09-02 20:37 dleaf_check returns negative with "dleaf entries out of order!" as the error message 2008-09-02 20:37 probably not because of a bug in dealf.c 2008-09-02 20:37 or should I say, "possibly" 2008-09-02 20:38 I still don't much like the inits to -1 2008-09-02 20:38 well 2008-09-02 20:38 I think I see 2008-09-02 20:38 you want a do ( } while (cond) structure 2008-09-02 20:38 probably 2008-09-02 20:39 so the loop iterates over the final n-1 elements 2008-09-02 20:39 it is never allowed to have zero iterations 2008-09-02 20:39 so return false if you find that before entering the do loop 2008-09-02 20:40 so groups aren't allowed to have zero entries? 2008-09-02 20:40 right 2008-09-02 20:40 I should write the definition 2008-09-02 20:40 and post it 2008-09-02 20:41 about time 2008-09-02 20:41 the comment is a little lame 2008-09-02 20:41 in editing a dleaf, and group that drops to zero has to be deleted immediately 2008-09-02 20:42 s/and/any/ 2008-09-02 20:42 what sort of formatting do you prefer for do/while loops? 2008-09-02 20:42 hmm 2008-09-02 20:43 lindent 2008-09-02 20:43 that's with the first curly on the same line as the do 2008-09-02 20:43 I don't like it, but linus does 2008-09-02 20:43 used to write them like you 2008-09-02 20:43 and the second curly on the same or different line as the while? 2008-09-02 20:44 but in the end there is no way to make c pretty ;-) 2008-09-02 20:44 heh 2008-09-02 20:44 different line 2008-09-02 20:44 ok 2008-09-02 20:47 hm, the implied zero entry, what loglo does it have? zero? 2008-09-02 20:54 um 2008-09-02 20:54 ACTION thinks 2008-09-02 20:54 it's not actually there 2008-09-02 20:54 that's where the aha happens 2008-09-02 20:55 only the nonzero entries are actually there, and they encode the upper bound from the key, rather than the usual offset 2008-09-02 20:55 we start one entry away and have an implied zero because we are picking up a pair of entries at each step 2008-09-02 20:56 the current entry and the one above in a sense 2008-09-02 20:56 or maybe better to think of it as the current entry and the one below 2008-09-02 20:56 hm 2008-09-02 20:56 start at -1, and compare to -2, and so on? 2008-09-02 20:56 where you can always directly look at the limit 2008-09-02 20:56 but have to use a clever trick to look at the offset 2008-09-02 20:57 hmm 2008-09-02 20:57 yes 2008-09-02 20:57 well 2008-09-02 20:58 first set offset to zero 2008-09-02 20:58 mhm 2008-09-02 20:58 then enter the loop at i = 0, and look up dict [i -1] -> limit 2008-09-02 20:59 it's a matter of taste 2008-09-02 20:59 and mine was not good when I wrote the original ;-) 2008-09-02 20:59 the loop should always execute eactly n iterations 2008-09-02 20:59 and it should start from zero, but never access dict[0] 2008-09-02 21:00 I think 2008-09-02 21:00 even if it fails early? 2008-09-02 21:00 fail means bail 2008-09-02 21:00 zero tolerance of errors 2008-09-02 21:00 so no, except when it fals 2008-09-02 21:00 fails 2008-09-02 21:01 so what I meant was, the loop should not execute n-1 times 2008-09-02 21:01 but n times 2008-09-02 21:01 yeah 2008-09-02 21:01 and let the actual index i not be used in the loop, but i - 1 instead 2008-09-02 21:01 that's like docmentation 2008-09-02 21:01 because you can arrange the loop to be able to use i directly, but that makes it harder to understand 2008-09-02 21:01 the optimizer can easily do that on its own 2008-09-02 21:02 any, I'm talking about what I _should_ have thought about when I wrote the original 2008-09-02 21:02 was in kind of a hurry to get something running 2008-09-02 21:03 wow, genuine uniden replacement batteries cost almost as much as a new handset 2008-09-03 00:43 konrad, remember I suggested shapor review your code for dleaf ;-) 2008-09-03 00:43 I'm afraid I haven't been as good as reviewer as I could 2008-09-03 00:43 hm? 2008-09-03 00:43 I'm just looking at all the bits of your code I didn't really read ;-) 2008-09-03 00:44 ah that patch was sort of huge glob of stuff 2008-09-03 00:45 int dleaf_ordered(BTREE, struct dleaf *leaf) 2008-09-03 00:45 { 2008-09-03 00:45 struct group *gdict = (void *)leaf + btree->sb->blocksize; 2008-09-03 00:45 struct entry *edict = (void *)(gdict - leaf->groups); 2008-09-03 00:45 tuxkey_t key = 0; 2008-09-03 00:45 --gdict; 2008-09-03 00:45 --edict; 2008-09-03 00:45 for (int group = 0; group < -leaf->groups; group--) { 2008-09-03 00:45 tuxkey_t basekey = (tuxkey_t)gdict[group].loghi << 24; 2008-09-03 00:46 for (int entry = 0; entry < -gdict[group].count; entry--) { 2008-09-03 00:46 tuxkey_t newkey = basekey | edict[entry].loglo; 2008-09-03 00:46 if (key > newkey) 2008-09-03 00:46 return 0; 2008-09-03 00:46 key = newkey; 2008-09-03 00:46 } 2008-09-03 00:46 } 2008-09-03 00:46 return 1; 2008-09-03 00:46 } 2008-09-03 00:46 notice my cavalier attitude towards channel spam ;-) 2008-09-03 00:46 it does less than yours in more lines 2008-09-03 00:46 but it seems to work better 2008-09-03 00:47 looks clear 2008-09-03 00:50 probably complete broken 2008-09-03 00:50 looks ok except you're looking up dict[n] instead of dict[n - 1] 2008-09-03 00:51 which won't work on n = 0 2008-09-03 00:52 the loop inequalities are backwards 2008-09-03 00:52 it sucks ;-) 2008-09-03 00:53 heh 2008-09-03 00:53 these things are a little different than ileafs 2008-09-03 00:53 could just forward loop 2008-09-03 00:53 well 2008-09-03 00:53 to generate the keys it's straightforward 2008-09-03 00:53 generating the offsets is a little trickier 2008-09-03 00:58 ah, a flaw in both of our code 2008-09-03 00:58 you have to keep resetting edict 2008-09-03 00:59 oh? 2008-09-03 00:59 oh, for the current group? 2008-09-03 00:59 I did that 2008-09-03 01:00 I don't see where 2008-09-03 01:00 hm, what are you looking at? 2008-09-03 01:00 http://pastie.org/264980 2008-09-03 01:01 didn't there, right 2008-09-03 01:02 works now 2008-09-03 01:04 cool 2008-09-03 01:04 ok, calculating the offset to make sure that ascends is trickier, we get back into those special zero cases 2008-09-03 01:05 and we have the key generation using just straightforward indexing (though backwards) and the offset generation using demented-off-by-one 2008-09-03 01:05 so there is no pretty way to write it 2008-09-03 01:06 probably the best to is have the dicts one above like before 2008-09-03 01:06 and make the adjustment by one for key lookup, because that is easy 2008-09-03 01:07 and use the same method as for ileaf for the offset generation 2008-09-03 01:07 well 2008-09-03 01:07 hmm 2008-09-03 01:07 dunno 2008-09-03 01:07 like I said, no pretty way 2008-09-03 01:08 it's only the entry->offset that is off by one 2008-09-03 01:08 group->count is directly indexed 2008-09-03 01:08 so mumble 2008-09-03 01:14 ok, this works 2008-09-03 01:14 unsigned members = edict[entry].limit - (entry ? edict[entry + 1].limit : 0); 2008-09-03 01:14 could write "size" instead of members 2008-09-03 01:15 then members is the amount by which we increment base offset 2008-09-03 01:15 by definition positive 2008-09-03 01:15 so it can't go backwards, only too high 2008-09-03 01:15 well, the limits can go backwards 2008-09-03 01:15 I think you checked for that 2008-09-03 01:16 so offset can go backwards too 2008-09-03 01:17 I can write this better 2008-09-03 01:29 struct group *gdict = (void *)leaf + btree->sb->blocksize; 2008-09-03 01:29 struct entry *edict = (void *)(--gdict - leaf->groups); 2008-09-03 01:29 better 2008-09-03 01:30 have the dicts sitting on the zeroth entry for dleaf 2008-09-03 01:32 it's how I wrote it originally 2008-09-03 01:32 yay for convergent evolution 2008-09-03 01:34 :D 2008-09-03 01:34 I'll post mine to the list 2008-09-03 01:34 you can slice and dice as you like 2008-09-03 01:34 it's ever so slightly more readable than the original dump 2008-09-03 01:34 sounds good to me 2008-09-03 01:39 I needed to get that skeleton written anyway, as part of the delte 2008-09-03 01:39 delete 2008-09-03 01:41 I'm probably heading to sleep soon 2008-09-03 01:41 gnight 2008-09-03 01:41 I should likewise 2008-09-03 01:41 night 2008-09-03 01:41 bye 2008-09-03 01:41 heh, is it 2am already? 2008-09-03 01:41 bye 2008-09-03 01:41 yup 2008-09-03 09:51 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-03 12:21 hey 2008-09-03 13:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-03 14:07 bh hi 2008-09-03 14:08 well there is some leaf truncate code 2008-09-03 14:08 icky code 2008-09-03 14:08 but... now onto the btree part of the truncate 2008-09-03 14:09 soon we will be able to delete files and be much more like a filesystem 2008-09-03 14:14 flips: there's a linkedIn group for file system engineers 2008-09-03 14:15 pointer? 2008-09-03 14:15 http://www.linkedin.com/groups?about=&gid=64287 2008-09-03 14:15 is there a group for rollerskating filesystem engineers? 2008-09-03 14:15 did that work? 2008-09-03 14:15 who rollerskates around here? 2008-09-03 14:15 ;-) 2008-09-03 14:24 linkedin customer service sucks 2008-09-03 14:43 yup 2008-09-03 14:46 what's your beef with LinkedIn? 2008-09-03 14:47 bugs 2008-09-03 14:47 tell you about it later 2008-09-03 14:47 k 2008-09-03 14:57 -!- pgquiles(~pgquiles@75.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-09-03 14:59 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-03 15:03 http://techvideoblog.com/ifa/98-linux-laptop-the-hivision-mininote/ 2008-09-03 15:03 <- $98 linux laptop 2008-09-03 15:03 apparently costs $120 at the moment 2008-09-03 15:03 600 MHz ARM 2008-09-03 15:03 does it blend? 2008-09-03 15:03 right in 2008-09-03 15:03 nice 2008-09-03 15:04 kay, I just have to hook up the good old ddsnap btree deletion to tux3 and we have file delete 2008-09-03 15:46 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-03 15:52 -!- pgquiles(~pgquiles@75.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-09-03 16:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-03 17:04 http://www-01.ibm.com/support/docview.wss?uid=swg21230196 2008-09-03 17:05 that must be ancient 2008-09-03 17:05 found that on the LinkedIn File Systems developers group list 2008-09-03 17:05 look like '02 2008-09-03 17:05 Modified date: 2008-09-03 17:05 2007-02-15 2008-09-03 17:05 my bad 2008-09-03 17:05 it's a network filesystem 2008-09-03 17:17 anyone heard of storspeed before? 2008-09-03 17:21 nope, looks like a stealth storage startup in austin 2008-09-03 17:21 flips: the second looks really clear to me 2008-09-03 17:22 the flattened one? 2008-09-03 17:22 yeah 2008-09-03 17:23 glad you like it 2008-09-03 17:23 your interest helped me get going on the delete 2008-09-03 17:25 from what I can gather, storspeed is developing a nfs cache solution 2008-09-03 17:35 ack, just spent a couple hours struggling with btree delete where I did a minor typo in the port from ddsnap 2008-09-03 17:36 well, at least this code can be worked on now 2008-09-03 17:36 unlike ddsnap, where it is buried in a huge system with no unit tests 2008-09-03 17:36 scary 2008-09-03 17:36 good thing it never had a bug 2008-09-03 17:38 heh 2008-09-03 17:38 sk8 oclock 2008-09-03 17:41 getting near the end of the just plain filesystem mechanism stuff 2008-09-03 17:41 some interesting bits coming soon 2008-09-03 17:44 wheel swap before skate 2008-09-03 17:47 that means you're planning to skate? 2008-09-03 17:50 hey 2008-09-03 17:50 should be 2008-09-04 02:07 -!- pgquiles(~pgquiles@75.Red-81-44-176.dynamicIP.rima-tde.net) has joined #tux3 2008-09-04 03:14 flips: ? 2008-09-04 03:33 bh, hi 2008-09-04 04:49 -!- pgquiles_(~pgquiles@156.Red-83-33-70.dynamicIP.rima-tde.net) has joined #tux3 2008-09-04 04:50 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-04 05:10 tux3 just learned how to delete 2008-09-04 06:04 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-04 06:42 whoo! 2008-09-04 10:05 flips: http://kerneltrap.org/Linux/Tux3_Acting_Like_A_Filesystem 2008-09-04 11:07 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-04 11:38 -!- pgquiles__(~pgquiles@209.Red-81-32-36.dynamicIP.rima-tde.net) has joined #tux3 2008-09-04 12:15 konrad@hopeless test $ echo "Hello tux3 world" > tmp/tux3 2008-09-04 12:15 konrad@hopeless test $ cat tmp/tux3 2008-09-04 12:15 Hello tux3 world 2008-09-04 12:15 FUSE + tux3 2008-09-04 12:19 just for fun :) 2008-09-04 12:20 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-04 12:20 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-04 12:20 flips: for when you wake up, ping 2008-09-04 13:01 konrad: oh? 2008-09-04 13:01 i was wondering when someone was going to do that ;) 2008-09-04 13:02 :D 2008-09-04 13:08 it's very much ugly, incorrect, and a hack 2008-09-04 13:14 konrad, pong 2008-09-04 13:15 fuse-tux3, reads/writes sort of 2008-09-04 13:15 :-D 2008-09-04 13:15 using routines basically copied from tux3.c 2008-09-04 13:16 that's what they're for 2008-09-04 13:16 konrad, you are hereby offically annoited the maintainer for the fuse fork 2008-09-04 13:17 ouch 2008-09-04 13:17 :D 2008-09-04 13:18 ok what you probably need most right now is control over the tracing output 2008-09-04 13:18 I was going to turn all the trace(printf( into just trace( 2008-09-04 13:19 and have trace be a real function instead of a macro 2008-09-04 13:20 I wasn't totally serious about fuse-tux3 2008-09-04 13:20 fsck, my typo is now splatted all over the web 2008-09-04 13:20 I wasn't totally serious about tux3 2008-09-04 13:20 heh 2008-09-04 13:21 see my post to the btrfs list 2008-09-04 13:21 "considering the wisdom" 2008-09-04 13:21 so now I know it was a stupidly big job ;-) 2008-09-04 13:21 is that a list worth subscribing to? 2008-09-04 13:21 http://kerneltrap.org/Linux/Tux3_Acting_Like_A_Filesystem 2008-09-04 13:22 mildly interesting 2008-09-04 13:22 mostly meat and potatoes debugging 2008-09-04 13:22 saw that 2008-09-04 13:22 zfs list is more interesting 2008-09-04 13:22 because more bugs ;-) 2008-09-04 13:22 heh 2008-09-04 13:22 I know someone who switched his /home over to fuse-zfs recently 2008-09-04 13:23 but he was on reiser4 before that 2008-09-04 13:23 hardcore 2008-09-04 13:23 perfect crash test dummy for tux3 2008-09-04 13:23 heh, yes 2008-09-04 13:24 get ready to post your fuse hack to lkml I would say 2008-09-04 13:24 pick the 3 worst things about it, fix, then post 2008-09-04 13:25 ehhh 2008-09-04 13:25 1. It exists. 2008-09-04 13:26 that's both a bug and a feature 2008-09-04 13:26 2. It brings tux3 up even with zfs on linux :D 2008-09-04 13:26 sort of 2008-09-04 13:26 haha 2008-09-04 13:27 only basic reads/ writes work 2008-09-04 13:27 just kidding 2008-09-04 13:27 morally even 2008-09-04 13:31 have to say fuse is pretty easy to work with 2008-09-04 13:40 I am ashamed to say I never tried 2008-09-04 13:43 ah, kerneltrap posted my repost without the typo 2008-09-04 13:51 flips: did you see the lvm snapshot merging patch? 2008-09-04 13:52 folks 2008-09-04 13:53 hey bh 2008-09-04 13:53 hey 2008-09-04 13:53 when arey ou oflks going to rock the LInux file systems world ? 2008-09-04 13:53 bah 2008-09-04 13:53 when are you folks going to rock the LInux file systems world ? 2008-09-04 13:53 better :) 2008-09-04 13:53 man my typing if f-ed right now 2008-09-04 13:53 depends on how much sleep flips gets 2008-09-04 13:54 yeah, the sooner the better 2008-09-04 13:54 the more he misses, the closer it comes to linux fs world rocking 2008-09-04 13:54 :-) 2008-09-04 13:54 show folks out of some barbaric crap 2008-09-04 13:54 tim_dimm, that's for the heads up 2008-09-04 13:54 yeah, I was talking to him late last night, he should be up by now 2008-09-04 13:55 thanks I mean 2008-09-04 13:55 my pleasure dude 2008-09-04 13:55 it was convenient I actually check in the patch I was talking about in the post kerneltrap linked 2008-09-04 13:55 when its done, you can sleep ;-) 2008-09-04 13:55 before sleeping 2008-09-04 13:58 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-04 13:59 about time to improve the tux3 front page, no? 2008-09-04 13:59 I think the geek chic may be wearing a little thin 2008-09-04 14:00 shapor's site looks pretty good 2008-09-04 14:00 just port that over 2008-09-04 14:00 flips: yeah, reading that online checking post more, lots of details in it 2008-09-04 14:00 shapor? 2008-09-04 14:01 bh, the main point is there 2008-09-04 14:01 that one can accelerate online checking with a small amount of additional metadata, rarely accessed 2008-09-04 14:03 shapor already put up today's press link :-) 2008-09-04 14:03 have you mapped out the cases yet for checking ? 2008-09-04 14:03 other than the basic inode tree integrity stuff ? 2008-09-04 14:05 I thought I did that in the post 2008-09-04 14:05 you mean more detail? 2008-09-04 14:05 yeah, well, in your mind regardless of the docs 2008-09-04 14:05 yes 2008-09-04 14:05 good :) 2008-09-04 14:05 there is the inode level and the directory level 2008-09-04 14:06 because I'm sick of how lame Linux file systems are 2008-09-04 14:06 you have to have a "good" bit on a compartment of the inode level before checking the directory level 2008-09-04 14:06 what about the directory link count ? solved by groups again ? 2008-09-04 14:10 ACTION is still reading the post 2008-09-04 14:11 hi eli 2008-09-04 14:11 shapor, you want to meet eli 2008-09-04 14:12 googler, ex cluster admin 2008-09-04 14:12 maze too 2008-09-04 14:12 ? 2008-09-04 14:13 maze, do you know eli? 2008-09-04 14:13 nope 2008-09-04 14:13 came to the zumastor talk, we hung out after 2008-09-04 14:13 googler alert, aisle 5 2008-09-04 14:13 has got hands on with some interesting stuff, like lustre and redhat gfs 2008-09-04 14:13 hmm, that's not his login name then is it? 2008-09-04 14:14 prolly not 2008-09-04 14:15 hmm, that guy we sat at the whiteboard with? 2008-09-04 14:15 hmm, and where's aisle 5? 2008-09-04 14:16 heh 2008-09-04 14:16 :-) 2008-09-04 14:16 just me being the smartass that I am 2008-09-04 14:16 MaZe, haven't met you yet either 2008-09-04 14:16 you need to 2008-09-04 14:16 two awesome dudes 2008-09-04 14:17 hi 2008-09-04 14:17 in fact now that I think about it, everybody on the channel is awesome ;-) 2008-09-04 14:17 even the bot 2008-09-04 14:17 eli, hi 2008-09-04 14:17 elicriffield@google 2008-09-04 14:17 hey, eli ;-) 2008-09-04 14:17 ah 2008-09-04 14:17 by our mere presence, we are awesome 2008-09-04 14:17 flips: yeah, I was about to suggest using some kind of centralize reverse map metadata file of some sort 2008-09-04 14:18 the bot is obviously the most awesome, by virtue of being here the longest 2008-09-04 14:18 focusing on early mounting is the end goal here 2008-09-04 14:18 the sooner it can be done the better 2008-09-04 14:19 Maze: :-) 2008-09-04 14:19 so maybe using delayed freeing or something like for deletion until enough of the disk has been verified so that you know it's safe to do so without crushing something would be good 2008-09-04 14:19 basically the normal stuff 2008-09-04 14:20 hi eli 2008-09-04 14:20 hey 2008-09-04 14:20 problem here is that it's kind of wacky for a Unix/Linux style system to groke 2008-09-04 14:20 bh, not sure the reverse makes sense as a file... maybe 2008-09-04 14:20 grok 2008-09-04 14:20 actually maybe that could be good 2008-09-04 14:20 flips: well you have it as something else 2008-09-04 14:20 if the reverses can be fixed size 2008-09-04 14:20 so you can directly index the reverse of a block 2008-09-04 14:21 the thing about it is it'll be small 2008-09-04 14:21 get a token into some other structure 2008-09-04 14:21 and easily verified by a simple inode integrity check 2008-09-04 14:21 well, then you can map it into the page cache 2008-09-04 14:21 decending downward 2008-09-04 14:21 downward? 2008-09-04 14:21 you can special case it and not worry about aliasing, etc... 2008-09-04 14:21 inode->indirect->data 2008-09-04 14:21 the normal stuff 2008-09-04 14:22 aliasing=multiple references 2008-09-04 14:23 bh, I'm going to refresh the online check doc and add it to the design mix 2008-09-04 14:23 got to be thought about from early 2008-09-04 14:23 because you'll have only one per voluem and it's special cased 2008-09-04 14:23 right 2008-09-04 14:23 right, which is why I'm mentioning it to you 2008-09-04 14:23 going to drop the multivolume wanking 2008-09-04 14:23 so you have considered a lot of things beforehand and not deadend your project 2008-09-04 14:24 from a design point of view 2008-09-04 14:24 it's a simple solution for a difficult to track and reproduce data structure, could be too simplistic, you're the fs expert here not me 2008-09-04 14:25 in the online cheker code, it'll have to be one of the first metadata files to check other than the allocation map 2008-09-04 14:27 bh, I think it's about as simplistic as necessary 2008-09-04 14:28 it would be wrong to bog down the fs with a topheavy structure just for fsck 2008-09-04 14:29 it's just an idea 2008-09-04 14:29 I wasn't rejecting 2008-09-04 14:29 hugs? 2008-09-04 14:29 just pointing out that it's not necessary for the required persistent structure to be complex 2008-09-04 14:29 sure, like I said, it's just an idea 2008-09-04 14:30 so what's necessary is to reintroduce a notion of allocation groups 2008-09-04 14:30 the same could apply for not so frequent reverse mappings as well, there still has to be an ordering issue to be considered at mount/check time 2008-09-04 14:30 and record any pointers that cross those groups 2008-09-04 14:31 ordering? 2008-09-04 14:31 the good thing about having it as a file is that it can be easily checked in a single file 2008-09-04 14:31 yeah mount check ordering 2008-09-04 14:31 still don't get it 2008-09-04 14:31 you want to be able to moun this volume asap 2008-09-04 14:32 exactly 2008-09-04 14:32 but you need a certain set of metadata checked first 2008-09-04 14:32 that's why all checks are incremental 2008-09-04 14:32 whether or not it's consistent or not 2008-09-04 14:32 very little is checked before it stops 2008-09-04 14:32 starts 2008-09-04 14:32 not much more than the sb magic number 2008-09-04 14:32 tux3 already checks magic numbers on inode table leaves and file index leaves by the way 2008-09-04 14:32 well, what about checking the bare minimal metadata before you can mount it ? what is that ? 2008-09-04 14:33 1) allocation map 2008-09-04 14:33 respectively 0x90de and 0xc0de 2008-09-04 14:33 2) some random reverse map 2008-09-04 14:33 ... 2008-09-04 14:33 what else ? 2008-09-04 14:33 list it 2008-09-04 14:33 that's my suggestion 2008-09-04 14:33 no checking before mount ;-) 2008-09-04 14:33 well, ok 2008-09-04 14:33 incremental checking starts as soon as root dir is opened 2008-09-04 14:34 don't you want your b-tree checked before doing something with it ? 2008-09-04 14:34 it gets checked on the fly 2008-09-04 14:34 the btree code has to be written not to oops even if you feed it random numbers 2008-09-04 14:34 what about other metadata like the allocation map ? 2008-09-04 14:34 also checked on the fly 2008-09-04 14:34 what if you have a corruption in the b-tree ? how are you going to deal with that ? 2008-09-04 14:35 ext2/3 work like this and it is very effective 2008-09-04 14:35 or a corruption in the allocation map ? 2008-09-04 14:35 except they don't do the only the fly fsck 2008-09-04 14:35 corruption detected in the btree... means a scan for lost leaves has to occur 2008-09-04 14:35 and eio on that file until it's complete 2008-09-04 14:35 maybe 2008-09-04 14:37 just had a thought 2008-09-04 14:37 we could have a special log item every now and then that just duplicates some throwaway copies of the top few levels of the inode btree 2008-09-04 14:37 these structures rarely change 2008-09-04 14:38 its quiting time for me, i'll be back, good seeing you again flips and maze 2008-09-04 14:38 so when they do, just invalidate that log item 2008-09-04 14:39 flips: can we please support front truncation of files? 2008-09-04 14:39 maze, I am writing a post about that right now ;-) 2008-09-04 14:39 among other things 2008-09-04 14:39 oh, awesome 2008-09-04 14:39 "the long and short of truncation" 2008-09-04 14:39 sparse checking should be easy 2008-09-04 14:39 do you mean, actually moving data forward logically, or just punching a hole in the front? 2008-09-04 14:39 it's been my belief that appendable front truncatable files are the best 2008-09-04 14:40 because unaligned front truncation is nasty 2008-09-04 14:40 no, freeing blocks from the front, if it's unaligned then you end up with a full block, with some bytes 'unused' 2008-09-04 14:40 right, and you leave the logical addresses untouched? 2008-09-04 14:41 by logical you mean from the point of view of the apps? yes, they just see zeroes there, if they seek, but opening the file would preferably (probably not doable, since this is vfs) seek to the first 'used' byte 2008-09-04 14:42 right, that's exactly what I'm implementing 2008-09-04 14:42 this would be just awesome for any sort of logging 2008-09-04 14:42 there's a big comment in the code about how deficient my first cut truncate function is in that respect ;-) 2008-09-04 14:42 otherwise the vfs would need a patch to seek to non-free byte first 2008-09-04 14:43 http://tux3.org/tux3?fd=b64615fb8a11;file=user/test/dleaf.c 2008-09-04 14:43 my belief is large files should be immutable, front-truncatable, appendable 2008-09-04 14:43 front truncation is not an option in tux3 2008-09-04 14:43 versioning requires it 2008-09-04 14:44 heh 2008-09-04 14:44 "hole punch" is the usual term 2008-09-04 14:44 there's a nasty little corner case with extents 2008-09-04 14:44 punch a hole in the middle of an extent, the metadata expands 2008-09-04 14:45 unlike all other hole punches 2008-09-04 14:45 punch a hole in the middle of a versioned extent and braindamage is a clear and present danger 2008-09-04 14:49 hole punch is a telecine term 2008-09-04 14:49 hole punching is different - because you can do it in the middle of files 2008-09-04 14:49 that's why we like it 2008-09-04 14:49 that also has a usecase 2008-09-04 14:50 what is the difference between a hole punch at the beginning of a file and a front truncate? 2008-09-04 14:51 just had a thought: it would be nice to have an ioctl to return the first "present" offset in a file, for log reading 2008-09-04 14:51 offset padded down to block boundary with zeros 2008-09-04 14:52 then that exabyte upper limit actually makes sense 2008-09-04 14:52 it could be conceivable to actually hit that one day 2008-09-04 14:52 and have to rotate ;-) 2008-09-04 14:56 news item: msft browsers lose just .5 more share to non-msft then it will be a market share tie on w3schools 2008-09-04 14:56 meaning half the people learning about html as learning it with a non-msft browser 2008-09-04 14:57 just had a thought: it would be nice to have an ioctl to return the first "present" offset in a file, for log reading -> precisely, I'd envision that to be the default location when you open such a file 2008-09-04 14:57 hah! 2008-09-04 14:57 nice 2008-09-04 14:57 open(...O_FRONT); 2008-09-04 14:58 loc padded down to block boundary 2008-09-04 14:58 pos I mean 2008-09-04 14:58 and aligned I mean 2008-09-04 14:59 everybody missed my pun on tree chopping so far 2008-09-04 14:59 btree_chop 2008-09-04 15:00 veritable paul bunyan 2008-09-04 15:00 I suppose that's because I didn't spell it that way 2008-09-04 15:00 currently leaf_chop, I have to change it 2008-09-04 15:07 tim_dimm, you now have a commit dedicated to you 2008-09-04 15:07 http://tux3.org/tux3 2008-09-04 15:07 the paul bunyan commit? 2008-09-04 15:07 flips: online checking the most important and igored things I know of in most file systems today 2008-09-04 15:07 that one yes 2008-09-04 15:07 bh, ignored no longer 2008-09-04 15:07 my first commit dedication- I'm touched 2008-09-04 15:07 I'll get a post together in the next couple of days 2008-09-04 15:08 flips: by you or folks in the general community ? 2008-09-04 15:08 I'll also start thinking about relative block pointers 2008-09-04 15:09 could get some excellent compression that way 2008-09-04 15:09 also code complexity ;-) 2008-09-04 15:09 because I think its the best stop gap measure we have so far for petabyte volumes 2008-09-04 15:09 bh, by the tux3 community 2008-09-04 15:09 yeah, thanks ;) 2008-09-04 15:09 I'm so glad somebody listens to me 2008-09-04 15:09 because this is a critical problem 2008-09-04 15:09 we'll always be here for you ;-) 2008-09-04 15:10 probably the most important problem for file systems as this moment followed by snapshots 2008-09-04 15:10 :) 2008-09-04 15:10 one of them, true 2008-09-04 15:10 I just hope that I'm being useful in these discussions 2008-09-04 15:10 just plain bugginess is probably the number one problem 2008-09-04 15:10 affects reiser4, zfs, btrfs 2008-09-04 15:10 maybe not hammer 2008-09-04 15:11 and maybe that is because I'm just not reading the bug reports 2008-09-04 15:11 I suspect it's just plain "not hammer" 2008-09-04 15:14 bh: just curious. what's your day gig? 2008-09-04 15:15 (pardon my curiosity) 2008-09-04 15:16 flips, after return 0;, how about a print "TIMBER" 2008-09-04 15:16 good call 2008-09-04 15:16 well 2008-09-04 15:17 we want to chop the tree, not bring down the system ;-) 2008-09-04 15:17 it's really tree_disintegrate 2008-09-04 15:17 tim_dimm: Novell's R&D group 2008-09-04 15:17 gotcha 2008-09-04 15:17 I'm mostly a concurrency person with the -rt patch 2008-09-04 15:17 mr locking 2008-09-04 15:17 how about writing "timberrrr" on the _error exit_ to chop tree? 2008-09-04 15:17 locking instrumentation specifically 2008-09-04 15:18 and -rt conversion of the kernel to be fully preemptible 2008-09-04 15:18 bh is going to test his locking intrumentation on tux3 2008-09-04 15:18 ok, I'm thinking... what next 2008-09-04 15:18 nice 2008-09-04 15:18 pretty much me and Ingo are the only two folks on this planet to have made some attempt and map out the problem space for that 2008-09-04 15:18 and I'm am awfully close to saying, kernel port 2008-09-04 15:18 but it's ingo's patch 2008-09-04 15:18 -rt that is 2008-09-04 15:18 I think it's kernel port next 2008-09-04 15:19 first a little cleanup of the current source 2008-09-04 15:19 not much cleanup 2008-09-04 15:19 we're going to do a proper fork for the kernel I think 2008-09-04 15:19 too hard to make a lot of the code match 2008-09-04 15:19 we'll see 2008-09-04 15:19 tim_dimm: but I'm ex-netapp WAFL 2008-09-04 15:19 bh does a good job of not telling me any secrets ;-) 2008-09-04 15:19 mostly as a sustaining engineer which the vast majority of that kind of work there 2008-09-04 15:19 oh wow 2008-09-04 15:20 yeah, so I've seen what an enterprise file system should look like roughly and Linux is far far behind that 2008-09-04 15:20 agreed 2008-09-04 15:20 as one of the guilty parties 2008-09-04 15:20 bh: I'm bizdev at MetaRAM 2008-09-04 15:20 what is that ? 2008-09-04 15:20 thus the _dimm part 2008-09-04 15:20 big memory 2008-09-04 15:20 metaram is cool 2008-09-04 15:21 fred weber, former cto of amd's startup 2008-09-04 15:21 double the ddr2 limit 2008-09-04 15:21 so you're an engineer or a business person ? 2008-09-04 15:21 and ddr3 2008-09-04 15:21 biz with an ear for engineering 2008-09-04 15:21 and why are you here btw ? 2008-09-04 15:21 bizdev for flips 2008-09-04 15:21 roller skate instructor 2008-09-04 15:21 right flips? 2008-09-04 15:21 yup 2008-09-04 15:21 I consult 2008-09-04 15:21 our biz is sk8ing 2008-09-04 15:22 zen of sk8biz 2008-09-04 15:22 used to be at violin memory 2008-09-04 15:22 that's where we met 2008-09-04 15:22 right 2008-09-04 15:22 1/2TB of DRAM used as storage 2008-09-04 15:22 tim_dimm: eh ? 2008-09-04 15:22 now you can cram that 1/2TB into a server 2008-09-04 15:22 violin is a ssd startup 2008-09-04 15:22 I'm just wondering what your interest here is as a non-engineer, seems odd 2008-09-04 15:23 uh, kinda hard to articulate in one line 2008-09-04 15:23 and I'm suspicious of business folks in general as a rule :) 2008-09-04 15:23 well, I'm interested 2008-09-04 15:23 you could be an EMC mole of some sort or something 2008-09-04 15:23 :) 2008-09-04 15:23 my role here is tux3 evangelism 2008-09-04 15:23 no, i just skate fast 2008-09-04 15:24 no mole action 2008-09-04 15:24 so you're in LA with flips ? 2008-09-04 15:24 yup 2008-09-04 15:24 venice 2008-09-04 15:24 ah ok 2008-09-04 15:24 nice ot know 2008-09-04 15:24 to know 2008-09-04 15:24 bh, tim_dimm is learning C 2008-09-04 15:24 u? 2008-09-04 15:24 among other things 2008-09-04 15:24 bh, from the biggest C slackers on the block 2008-09-04 15:24 just a random angry kernel dude 2008-09-04 15:25 akd 2008-09-04 15:25 apparently angry enough to be a Solaris kernel engineer according to them :) 2008-09-04 15:25 rakd 2008-09-04 15:25 tim_dimm is learning C about the same rate I'm learning skating 2008-09-04 15:25 gee, hope so 2008-09-04 15:25 bh, I used to be in post production as an online editor and colorist 2008-09-04 15:25 I wonder how big a bribe sun would offer for me to work on zfs ;-) 2008-09-04 15:26 I'd be happy to 2008-09-04 15:26 what got me into technology was i/o bottlenecks 2008-09-04 15:26 start: rm * 2008-09-04 15:26 oh really 2008-09-04 15:26 ? 2008-09-04 15:26 and abandon tux3 ? 2008-09-04 15:26 depends on the size of the bribe 2008-09-04 15:26 uncompressed 4k digital cinema is 48MB per frame, 1.2GB/s 2008-09-04 15:26 you're doing well at Google and happy right ? 2008-09-04 15:26 ACTION has his price 2008-09-04 15:26 ok 2008-09-04 15:26 nice to know that you have a price 2008-09-04 15:26 good thing it's open source hmm 2008-09-04 15:27 can't take it back now 2008-09-04 15:27 it's unlikely that they'll give you that bribe unless they're in need of something for that NetApp/Sun lawsuit 2008-09-04 15:27 don't you think that tux3 would be a better file system ? 2008-09-04 15:28 bh: did you ever read flips' ramback, faster than a speeding bullet post? 2008-09-04 15:28 bh, of course it will 2008-09-04 15:28 tux3 will have about 1/10th the cache footprint of zfs 2008-09-04 15:28 which is a major source of bugs for them 2008-09-04 15:29 and I really don't care that tux3 can only access an exabyte while zfs can boil the oceans 2008-09-04 15:29 my fs is not for boiling oceans, I like the oceans the way they are 2008-09-04 15:29 really, who's going to utilize more than that in a single namespace 2008-09-04 15:30 ...in the next 5 years 2008-09-04 15:30 cern 2008-09-04 15:30 in 5 years tux3.1 will be out 2008-09-04 15:30 t-minus 6 days, btw 2008-09-04 15:31 nice to know we've got that long to live 2008-09-04 15:31 6 days or 5 yrs 2008-09-04 15:31 "the god particle" 2008-09-04 15:31 just don't be evil for the next 6 days as insurance 2008-09-04 15:31 6 days 2008-09-04 15:31 I suggested we name our son "higgs" 2008-09-04 15:31 Higgs Huber sounds dumb though 2008-09-04 15:31 that would be cool 2008-09-04 15:32 sounds good 2008-09-04 15:32 really 2008-09-04 15:32 so we settled for Pi 2008-09-04 15:32 I like higgs more for what it's worth 2008-09-04 15:32 I did too 2008-09-04 15:32 tim_dimm: no 2008-09-04 15:32 flips, wanna give the ramback one liner? 2008-09-04 15:32 higgs would be a most excellent middle name 2008-09-04 15:32 sounds "rich" 2008-09-04 15:33 hoidy toidy 2008-09-04 15:33 bh, ramback: every little factor of 25 performance improvement really helps 2008-09-04 15:34 http://lwn.net/Articles/272534/ 2008-09-04 15:39 ACTION reads 2008-09-04 15:53 konrad, could you post your fuse recipe to the tux3 list? 2008-09-04 15:54 ACTION rolls, really 2008-09-04 15:54 what happen to your userspace porting of the Linux page cache or something like that ? 2008-09-04 15:55 fuse-tux3.c + build instructions? 2008-09-04 15:56 it's not pretty, but ok 2008-09-04 15:56 I'll try and clean it up a bit first 2008-09-04 16:01 $ dd if=/dev/zero of=tmp/zeros2 count=1000 2008-09-04 16:01 1000+0 records in 2008-09-04 16:01 1000+0 records out 2008-09-04 16:01 512000 bytes (512 kB) copied, 13.9749 s, 36.6 kB/s 2008-09-04 16:02 it's not the quickest thing 2008-09-04 16:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-04 16:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-04 17:28 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-04 18:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-04 20:37 bh, the user space portin of the linux page cache was completed 3 or 4 weeks ago 2008-09-04 20:37 porting 2008-09-04 20:37 there was a kind of dazed sounding post about it 2008-09-05 00:07 -!- stargazr5(~gaurav@59.95.6.25) has joined #tux3 2008-09-05 00:11 -!- cdk(~chinmay@59.95.14.95) has joined #tux3 2008-09-05 00:15 got to consider what to hack next 2008-09-05 00:15 I'll sleep on it 2008-09-05 00:17 wow, yummy fuse code from conrad 2008-09-05 00:17 unyum 2008-09-05 00:17 :D 2008-09-05 00:19 @konrad:: trying to compile ur fuse file....getting errors.. 2008-09-05 00:19 line 256 2008-09-05 00:19 hm? 2008-09-05 00:19 error? 2008-09-05 00:19 unknown filed 'key' 2008-09-05 00:19 specified in initializer 2008-09-05 00:20 is your checkout of tux3 up to date? 2008-09-05 00:20 ok....i thought so....one moment will get back 2008-09-05 00:21 installing fuse now 2008-09-05 00:22 enjoy the pain 2008-09-05 00:22 lots of segfaults 2008-09-05 00:22 and no proper readdir 2008-09-05 00:22 I'm not sure how it's done, so I just ignored it completely 2008-09-05 00:25 tux3fuse.c:113:47: error: macro "fuse_main" passed 4 arguments, but takes just 3 2008-09-05 00:25 eh, you must have a different version of fuse than me 2008-09-05 00:25 the example on the fuse site shows 3 args 2008-09-05 00:25 but my version takes 4 2008-09-05 00:25 I have 2.7.3 2008-09-05 00:27 just get rid of the NULL argument 2008-09-05 00:29 fuse: failed to exec fusermount: No such file or directory 2008-09-05 00:30 the mountpoint or the fake filesystem? 2008-09-05 00:31 I think there is no fusermount 2008-09-05 00:31 ah, weird 2008-09-05 00:31 what distro? 2008-09-05 00:32 or I don't have fuse compiled into my kernel 2008-09-05 00:32 debian etch 2008-09-05 00:32 I'll run it on something else 2008-09-05 00:32 hm 2008-09-05 00:32 I'm on Fedora 9 2008-09-05 00:32 need to install fuse-utils I think 2008-09-05 00:32 ah 2008-09-05 00:33 got the latest version .... still cant compile :: undefined reference to `btree_delete' 2008-09-05 00:33 are you compiling it correctly? 2008-09-05 00:33 cdk, change that to tree_chop 2008-09-05 00:33 oh, did it change? 2008-09-05 00:33 k 2008-09-05 00:33 it did 2008-09-05 00:33 gratuitous 2008-09-05 00:33 heh, I guess I'm out of date :D 2008-09-05 00:33 going to have to be more careful about that now 2008-09-05 00:34 you might want to post a rebased version 2008-09-05 00:34 just for now until its merged 2008-09-05 00:34 I'll do that tomorrow, got to sleep now 2008-09-05 00:34 fusermount: failed to open /dev/fuse: No such file or directory 2008-09-05 00:34 I suppose I have to make the devnode 2008-09-05 00:34 is the fuse module loaded into your kernel? 2008-09-05 00:35 or somit 2008-09-05 00:35 no 2008-09-05 00:35 I'm not totally familiar with fuse 2008-09-05 00:35 but I need to make the devnode anyway 2008-09-05 00:35 I think 2008-09-05 00:35 ok compiled 2008-09-05 00:35 cdk is going to beat me ;-) 2008-09-05 00:35 heh 2008-09-05 00:36 sudo mknod -m 666 /dev/fuse c 10 229 2008-09-05 00:37 fusermount: fuse device not found, try 'modprobe fuse' first <- maybe as far as I get tonight 2008-09-05 00:37 yes....mounted and visible.. 2008-09-05 00:37 :) 2008-09-05 00:37 :) 2008-09-05 00:38 and I'm sure, very breakable 2008-09-05 00:38 cdk: now count the seconds until segfault 2008-09-05 00:38 :D 2008-09-05 00:38 we'll fix that 2008-09-05 00:38 :D 2008-09-05 00:39 compiling a fuse module 2008-09-05 00:39 stupid make decided to recompile the whole kernel 2008-09-05 00:39 oh no 2008-09-05 00:40 actually it did the right thing 2008-09-05 00:40 but I compiled the wrong thing 2008-09-05 00:40 configfs :p 2008-09-05 00:41 eh 2008-09-05 00:41 heh 2008-09-05 00:41 ok....loop while deleting file.. 2008-09-05 00:41 found it 2008-09-05 00:41 "filesystem in userspace support" 2008-09-05 00:42 cdk, loop? 2008-09-05 00:42 have more that one file on the fs....after fuse shows only one file hello??? 2008-09-05 00:42 cdk: yes. 2008-09-05 00:42 readdir doesn't work 2008-09-05 00:42 so remember the name you used before and cat the file 2008-09-05 00:42 that much should work 2008-09-05 00:42 mounted 2008-09-05 00:42 k 2008-09-05 00:43 readdir will work tomorrow morning ;-) 2008-09-05 00:43 need to sleep now 2008-09-05 00:43 there is the hello file 2008-09-05 00:43 how did it get there? 2008-09-05 00:43 where are u guys...its 1:30 afternoon here 2008-09-05 00:43 flips: look at tux3_readdir() 2008-09-05 00:43 it's static 2008-09-05 00:44 cdk: Pacific time, west coast USA 2008-09-05 00:44 in india 2008-09-05 00:44 konrad, nice 2008-09-05 00:44 I'll make it real pretty soon 2008-09-05 00:44 excellent 2008-09-05 00:45 konrad, this is a most pleasant development 2008-09-05 00:45 good 2008-09-05 00:45 cdk, maybe I'll drop by one of these days ;-) 2008-09-05 00:45 ofcourse 2008-09-05 00:45 always welcome 2008-09-05 00:46 and you're good enough to get fuse working faster than me ;-) 2008-09-05 00:46 course I'm kind of lame at stuff like that 2008-09-05 00:46 ;-) 2008-09-05 00:46 shapor is missing the fun 2008-09-05 00:46 yeah only good at making stuff work....no good at devel 2008-09-05 00:46 cdk, we can fix that 2008-09-05 00:46 hi flips 2008-09-05 00:47 everybody's running tux3-fuse while you're... 2008-09-05 00:47 doing something ;-) 2008-09-05 00:47 having a life maybe 2008-09-05 00:47 yeah, a bit of one anyway lol 2008-09-05 00:47 sh-3.1# echo hello world >foo/foo 2008-09-05 00:47 sh-3.1# cat foo/foo 2008-09-05 00:47 hello world 2008-09-05 00:47 sh-3.1# 2008-09-05 00:48 amazing 2008-09-05 00:48 :D 2008-09-05 00:48 it's mountable 2008-09-05 00:48 it is 2008-09-05 00:48 developers should come pouring in now 2008-09-05 00:48 tomorrow morning readdir will work 2008-09-05 00:48 yah 2008-09-05 00:48 I need to prepare for this by sleeping now ;-) 2008-09-05 00:49 oh 2008-09-05 00:49 I'll check it in first 2008-09-05 00:49 rebased and all? 2008-09-05 00:49 yes, that was easy 2008-09-05 00:49 k, good 2008-09-05 00:49 saves me 30 seconds tomorrow morning 2008-09-05 00:50 konrad: great work on getting tux3 up in fuse ;) 2008-09-05 00:50 ok....flips u beat me to cat... 2008-09-05 00:51 i still cant do that 2008-09-05 00:51 thanks shapor 2008-09-05 00:51 heh 2008-09-05 00:51 I caught up 2008-09-05 00:51 tux3fuse.c, ok konrad? 2008-09-05 00:51 and have it in the same directory for now 2008-09-05 00:51 sure 2008-09-05 00:51 its great to see it looking like a filesystem before the kernel port 2008-09-05 00:52 really 2008-09-05 00:52 will make testing quite scriptable 2008-09-05 00:52 when can we expect the kernel port? 2008-09-05 00:52 and tux3 university starts soon 2008-09-05 00:52 heh, tomorrow night, and if it doesn't happen, blame flips 2008-09-05 00:52 sounds good 2008-09-05 00:52 :) 2008-09-05 00:53 a few of my friends are also interested in tux3 might get them together and get some devel work done. 2008-09-05 00:55 ok...cat working now.. 2008-09-05 00:55 @konrad:: great works.... 2008-09-05 00:56 great work 2008-09-05 00:57 konrad, want your email in the commit message or not? 2008-09-05 00:57 hm? 2008-09-05 00:58 Port of tux3 to fuse, contributed by Conrad Meyer <- for example 2008-09-05 00:58 oh, sure 2008-09-05 00:58 it's in 2008-09-05 00:58 no more version skew ;-) 2008-09-05 00:59 yay 2008-09-05 00:59 added it to make? 2008-09-05 00:59 nope ;-) 2008-09-05 01:00 next 2008-09-05 01:02 tux3 is not in make either... 2008-09-05 01:03 some lamer didn't put it in I guess 2008-09-05 01:03 heh 2008-09-05 01:03 night night 2008-09-05 01:03 night 2008-09-05 01:04 i am off as well 2008-09-05 01:04 nice development 2008-09-05 01:04 thanks again konrad 2008-09-05 01:04 np 2008-09-05 01:06 -!- cdk(~chinmay@59.95.14.95) has left #tux3 2008-09-05 01:09 ok, makefile is there 2008-09-05 01:09 test isn't 2008-09-05 01:09 patches gratefully accepted 2008-09-05 01:09 there are some warnings to clear up 2008-09-05 01:21 sleepy time 2008-09-05 01:21 today was fun 2008-09-05 01:34 -!- Danjel(~chatzilla@c-a721e255.1143-1-64736c12.cust.bredbandsbolaget.se) has joined #tux3 2008-09-05 01:52 ok, rm doesn't segfault if the file doesn't exist now 2008-09-05 02:06 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-05 02:07 flips: heh you make almost the identical change to the Makefile that I did in that commit 2008-09-05 02:07 was only a few characters off, heh 2008-09-05 02:08 hey flips 2008-09-05 02:10 do you think that the ZFS notion of having sha1 hashes all over the place is sufficient to guarantee the integrity of a volume given that you can fix the corruption from redundant copies ? 2008-09-05 02:10 in place of online checking or any kind of checking ? 2008-09-05 02:12 ooh too bad 2008-09-05 02:13 droppoed off 2008-09-05 02:13 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-05 02:13 dropped off 2008-09-05 02:13 flips: you there ? 2008-09-05 02:21 sha1 in expensive 2008-09-05 02:21 probably needlessly 2008-09-05 02:21 and unless they are real real careful about the order of operations 2008-09-05 02:21 they still can't guarantee memory/cpu/disk channel corruption won't go undetected 2008-09-05 02:22 s/in/is/ 2008-09-05 02:22 [not that you can ever be 100% sure, if your cpu/memory could be bad - unless you're making a net fs] 2008-09-05 02:23 yeah, they could be summing a bad chunk of memory 2008-09-05 02:24 what else ? 2008-09-05 02:24 what about phantom writes ? 2008-09-05 02:46 ok night 2008-09-05 03:48 -!- nobody(c8c32a07@67.207.141.120) has joined #tux3 2008-09-05 04:27 -!- nobody(c8c32a07@67.207.141.120) has left #tux3 2008-09-05 05:06 -!- cdk(~chinmay@59.95.49.53) has joined #tux3 2008-09-05 05:07 -!- cdk(~chinmay@59.95.49.53) has left #tux3 2008-09-05 05:17 -!- stargazr5(~gauravstt@59.95.17.124) has joined #tux3 2008-09-05 07:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-05 09:36 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-05 10:18 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-05 10:35 -!- stargazr5(~gauravstt@59.95.19.213) has joined #tux3 2008-09-05 10:41 where does fuse log to? 2008-09-05 10:41 funny it doesn't just write to console when running on foreground 2008-09-05 10:58 oh it does 2008-09-05 10:58 silly me 2008-09-05 11:16 whoever came up with the readdir interface needs to be badly hurt 2008-09-05 12:07 hey 2008-09-05 12:07 hi 2008-09-05 12:11 moonbase:/more/src/hg/tux3/user/test# ls foo 2008-09-05 12:11 /bar /bar2 /bar3 /bar4 /bar5 /bar6 /foo /foo2 /foo3 /hello 2008-09-05 12:11 not sure what all the slashes are about 2008-09-05 12:14 the kernel filldir interface is the most amazing stinking pile of poo I have ever seen 2008-09-05 12:14 http://lxr.linux.no/linux+v2.6.26.3/fs/readdir.c#L146 2008-09-05 12:14 next time I'm in public I'll wear a bag over my head that says "no I am not a linux kernel hacker" 2008-09-05 12:15 dang it intel 2008-09-05 12:16 i'm looking for a x48 PCIe chipset, and they named theirs x48 2008-09-05 12:16 but it only supports 38 lanes 2008-09-05 12:16 I know I've seen a x48 lane pcie chipset out there 2008-09-05 12:16 bounders] 2008-09-05 12:17 that's a lot of lanes 2008-09-05 12:35 fuse marches on 2008-09-05 12:35 directory listing works now 2008-09-05 12:35 seems to 2008-09-05 12:36 I don't think it should be printing / for every file though 2008-09-05 12:36 wonder why it does that 2008-09-05 12:36 exercise for konrad ;-) 2008-09-05 12:46 http://hardware.slashdot.org/comments.pl?sid=954803&cid=24889733 2008-09-05 12:47 someone passed on my call for helpers 2008-09-05 12:48 nice 2008-09-05 12:49 btw, I pinged wook for an assist on the visualization we discussed 2008-09-05 12:50 no response yet 2008-09-05 12:57 wooklag 2008-09-05 13:31 wook_lag 2008-09-05 13:31 :-) 2008-09-05 13:57 wow- tux3 got a slashdot lkml bounce 2008-09-05 13:58 ? 2008-09-05 13:58 its #2 on the lkml hottest message list now 2008-09-05 13:59 ah 2008-09-05 13:59 heh 2008-09-05 13:59 speculating that was from the slashdot reference earlier 2008-09-05 13:59 i bet its just flips clicking reload ;) 2008-09-05 14:00 actually probably due to the fact that its been on the front page of kerneltrap for the past couple days 2008-09-05 14:00 well past day 2008-09-05 14:00 could be 2008-09-05 14:38 hullo 2008-09-05 14:42 hi konrad 2008-09-05 14:42 see the extra slashes on the ls output? 2008-09-05 14:42 I did not have time to investigate 2008-09-05 14:42 nope, build error :) 2008-09-05 14:42 tux3fuse.c:251:41: error: macro "fuse_main" requires 4 arguments, but only 3 given 2008-09-05 14:42 fun fun 2008-09-05 14:43 wow 2008-09-05 14:43 back up cheek by jowel with Linus's post 2008-09-05 14:43 lkml? 2008-09-05 14:43 slashdot blowback 2008-09-05 14:43 http://lkml.org/ 2008-09-05 14:44 konrad, I guess you have a different version of fuse 2008-09-05 14:44 too bad they couldn't keep the interface stable 2008-09-05 14:44 just add the NULL back 2008-09-05 14:44 we'll figure out what to do later 2008-09-05 14:45 konrad, anyway it should be tux3fs.c by now 2008-09-05 14:45 right 2008-09-05 14:45 got to go out for a bit 2008-09-05 14:45 heh, typo 2008-09-05 14:45 it's fux3fs.c 2008-09-05 14:47 whoops 2008-09-05 14:47 hm, getattr seems to be failing here 2008-09-05 14:47 yes, getattr is totally not implemented 2008-09-05 14:47 yet 2008-09-05 17:27 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-05 17:32 konrad, which version of libfuse2 do you have? 2008-09-05 17:32 2.7.3 2008-09-05 17:32 2.5.3 here 2008-09-05 17:33 it's a shame they couldn't keep the api stable across 2 point releases 2008-09-05 17:33 I guess we need a #if on version 2008-09-05 17:33 hard to see how that will work 2008-09-05 17:34 heh 2008-09-05 17:34 or I upgrade 2008-09-05 17:34 and we define to work only on recent fuse 2008-09-05 17:34 :p 2008-09-05 17:34 if you feel like it 2008-09-05 17:34 I don't 2008-09-05 17:34 it's a backport to etch 2008-09-05 17:34 could just pass a define to gcc or somit 2008-09-05 17:34 hate fiddling with that 2008-09-05 17:35 yes, but the define has to key on something 2008-09-05 17:36 konrad, can you do ls /usr/lib/libfuse* 2008-09-05 17:37 then have stuff like /usr/include/fuse/fuse_compat.h 2008-09-05 17:38 and /usr/include/fuse/fuse_lowlevel_compat.h 2008-09-05 17:38 flips: you going to try and get this running on fuse ? 2008-09-05 17:38 bh, it's already running on fuse 2008-09-05 17:38 ok 2008-09-05 17:38 konrad did it 2008-09-05 17:38 nice 2008-09-05 17:38 when did konrad come on board ? 2008-09-05 17:38 now debugging the bugs 2008-09-05 17:38 nice 2008-09-05 17:38 earlier this week 2008-09-05 17:38 nice 2008-09-05 17:38 I think 2008-09-05 17:38 maybe last week 2008-09-05 17:38 I have /usr/include/fuse/fuse_compat.h 2008-09-05 17:39 but no /usr/lib/libfuse 2008-09-05 17:39 ah, redhat puts them somewhere else I suppose 2008-09-05 17:39 ok, out 2008-09-05 17:39 for no good reason 2008-09-05 17:39 well, and 64-bit 2008-09-05 17:39 /usr/lib64 2008-09-05 17:39 ah 2008-09-05 17:39 and what's there? 2008-09-05 17:40 nothing 2008-09-05 17:40 libfuse2 it should be 2008-09-05 17:40 ah 2008-09-05 17:40 /lib64/libfuse.so 2008-09-05 17:40 should be a bunch more 2008-09-05 17:41 ls /usr/lib64/libfuse.so* 2008-09-05 17:41 ls /usr/lib64/libfuse*.so* 2008-09-05 17:42 nope 2008-09-05 17:42 rpm -ql fuse-devel 2008-09-05 17:42 ah, I dimly recall rpm -ql 2008-09-05 17:42 /lib64/libfuse.so 2008-09-05 17:42 /lib64/libulockmgr.so 2008-09-05 17:43 that's it 2008-09-05 17:43 for .so's 2008-09-05 17:43 packaged quite differently 2008-09-05 17:43 and incompatibly 2008-09-05 17:43 I suspect the problem isn't the fuse guys 2008-09-05 17:43 but "creative" redhatters 2008-09-05 17:45 #define FUSE_USE_VERSION 26 <- aha 2008-09-05 17:45 that must have something to do with it 2008-09-05 17:46 ah, does 2.5.3 do version 26? 2008-09-05 17:46 read your fuse/fuse.h 2008-09-05 17:49 #ifndef FUSE_USE_VERSION 2008-09-05 17:49 #define FUSE_USE_VERSION 21 2008-09-05 17:49 #endif <- then there are lots of function mismatches and no .readdir 2008-09-05 17:49 hm 2008-09-05 17:50 #define fuse_main(argc, argv, op) \ 2008-09-05 17:50 fuse_main_real(argc, argv, op, sizeof(*(op))) 2008-09-05 17:50 what have you got there? 2008-09-05 17:51 477 #define fuse_main(argc, argv, op, user_data) \ 2008-09-05 17:51 478 fuse_main_real(argc, argv, op, sizeof(*(op)), user_data) 2008-09-05 17:51 idiotic api breakage 2008-09-05 17:52 :D 2008-09-05 17:52 well we can grep fuse.h for user_data 2008-09-05 17:52 in the makefile 2008-09-05 17:52 and write nasty notes there 2008-09-05 17:52 or we can ask on their mailing list 2008-09-05 17:53 better style 2008-09-05 17:53 there's probably an irc channel on this server 2008-09-05 17:53 I think I tried actually 2008-09-05 17:53 here and freenode 2008-09-05 17:53 nope 2008-09-05 17:54 http://lists.sourceforge.net/lists/listinfo/fuse-devel 2008-09-05 17:55 konrad, how would you like to be our representative on the fuse-devel list? 2008-09-05 17:55 :-) 2008-09-05 17:55 joining 2008-09-05 17:56 got an idea how to frame the question? 2008-09-05 17:56 "Why did you break the %#& API?" 2008-09-05 17:56 :-) 2008-09-05 17:56 excellent way to get silence 2008-09-05 17:57 right 2008-09-05 17:57 "what is the expected workaround for the extra user_data parameter added to fuse_main? 2008-09-05 18:02 konrad, you know what I'm going to do? 2008-09-05 18:03 hack my fuse.h 2008-09-05 18:03 and add an extra parameter that is ignored 2008-09-05 18:03 k 2008-09-05 18:03 I'll rev tux3 pretty soon to user the 4 parameter flavor of fuse_main 2008-09-05 18:03 skate first 2008-09-05 18:03 enjoy :) 2008-09-05 18:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-05 20:37 konrad, fixed the rename damage and adapted to the 4 arg from or fuse_main 2008-09-05 22:34 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-06 02:29 bleh 2008-09-06 02:30 the mailserver on sf.net rejects senders who don't have a postmaster@ address 2008-09-06 02:33 sending/subscribing from my other address 2008-09-06 03:35 -!- cdk(~chinmay@121.246.36.66) has joined #tux3 2008-09-06 03:43 -!- cdk(~chinmay@121.246.36.66) has left #tux3 2008-09-06 05:01 -!- stargazr5(~gauravstt@59.95.4.235) has joined #tux3 2008-09-06 07:39 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-06 09:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-06 11:34 -!- stargazr5(~gauravstt@59.95.21.222) has joined #tux3 2008-09-06 14:34 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-06 15:12 nearly sk8 oclock 2008-09-06 15:12 test earlier these days 2008-09-06 15:12 gets earlier 2008-09-06 15:13 the french girls go back to their hotels earlier too 2008-09-06 15:13 temperature drops slightly below bikini degrees, centigrade 2008-09-06 15:25 we have a response 2008-09-06 15:26 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-06 15:30 For example autoconf that tests for the number of parameters. 2008-09-06 15:30 See http://tinyurl.com/6k6hnm on how this is done. 2008-09-06 15:30 Then again, FUSE 2.5.3 is almost 2 1/2 years old.. 2008-09-06 15:30 """ 2008-09-06 15:40 response? 2008-09-06 15:40 oh 2008-09-06 15:40 the fuse list :-) 2008-09-06 15:40 autoconf is banned from tux3 2008-09-06 15:41 solutions that avoid autoconf are welcome 2008-09-06 15:41 for example, whatever test autoconf uses 2008-09-06 15:41 we cut & paste 2008-09-06 15:41 stripping out the fluff 2008-09-06 15:42 2.5 years is short on the unix timescale 2008-09-06 15:42 got to think longer term than that 2008-09-06 15:43 nice piece of autoconf wanking 2008-09-06 15:43 has nothing to do with fuse 2008-09-06 15:43 right, I don't like autoconf either 2008-09-06 15:43 hopefully a more clueful response is in the pipe :-) 2008-09-06 15:44 anyway, we have solved it, I hacked my fuse.h 2008-09-06 15:44 alright 2008-09-06 15:44 it's good you are on the fuse list 2008-09-06 15:45 I suspect there are many other things to complain about, most of them more important 2008-09-06 15:47 note that fuse has a parallel mode 2008-09-06 15:47 I actually tried to use fi->fh earlier 2008-09-06 15:47 so once we have it basically working 2008-09-06 15:47 ah 2008-09-06 15:47 but it was failing on the second read 2008-09-06 15:47 maybe I was shoving the wrong pointer in it 2008-09-06 15:47 need to get the fuse source and compile 2008-09-06 15:47 and debug the two together 2008-09-06 15:48 in uml even 2008-09-06 15:48 ok, I can cook up a recipe for that 2008-09-06 15:48 just turning on fuse event tracing in the kernel would be a huge help 2008-09-06 15:48 I wonder if there is an easy way to do that 2008-09-06 15:49 let's see what's in the fuse kernel code ;-) 2008-09-06 15:49 http://lxr.linux.no/linux+v2.6.26.3/fs/fuse/ <- fuse kernel code 2008-09-06 15:50 http://lxr.linux.no/linux+v2.6.26.3/include/linux/fuse.h <- and here 2008-09-06 15:51 http://lxr.linux.no/linux+v2.6.26.3/+ident=14862186 <- FOPEN_KEEP_CACHE for example 2008-09-06 15:52 http://lxr.linux.no/linux+v2.6.26.3/fs/fuse/dir.c#L755 <- /* Directories have separate file-handle space */ 2008-09-06 15:53 FUSE_GETATTR_FH 2008-09-06 15:53 konrad, better than a pointer is the inum, for now 2008-09-06 15:54 call open_inode 2008-09-06 15:54 we don't really care about performance at this point 2008-09-06 15:54 just working right 2008-09-06 15:54 k 2008-09-06 15:55 the libfuse stuff is important too 2008-09-06 15:55 more important that the kernel I think 2008-09-06 15:55 so we should compile it with -g 2008-09-06 15:55 to set breaks and see how it screws up ;-) 2008-09-06 15:55 let's see, how do you get a debug build in debian 2008-09-06 15:56 weakness of debian 2008-09-06 15:56 this is where gentoo is good 2008-09-06 15:56 but I can just apt-get remove libfuse2 2008-09-06 15:56 and build from tarball like a proper hacker 2008-09-06 16:01 sudo apt-get remove --purge fuse-utils libfuse-dev libfuse2 2008-09-06 16:05 all the pain of autoconf is coming back to me now 2008-09-06 16:05 including library path skew 2008-09-06 16:13 ok, installed, working 2008-09-06 16:13 needlessly painful 2008-09-06 16:14 ld.so.config... not documented in man ld :-P 2008-09-06 16:14 ld.so.conf I meant 2008-09-06 16:14 see? pain 2008-09-06 16:15 aha! it was a fuse bug 2008-09-06 16:15 now does not claim to be busy on umount 2008-09-06 16:16 heh 2008-09-06 16:16 and tux3fs does not exit file file not found when you unmount 2008-09-06 16:16 what version of fuse are you using now? 2008-09-06 16:16 um 2008-09-06 16:16 2.7.4 2008-09-06 16:16 is it leet? 2008-09-06 16:17 heh 2008-09-06 16:17 more recent anyways 2008-09-06 16:17 I decided not to track their unstable 2008-09-06 16:17 any more than they should track ours 2008-09-06 16:17 for their home dirs ;-) 2008-09-06 16:18 kay, got to get serious about skating 2008-09-06 16:18 tux3 fuse is looking good 2008-09-06 16:32 it's about time to create the version table 2008-09-06 16:33 will be inode number 2, maybe 2008-09-06 16:33 or maybe it deserves to be number 0 2008-09-06 16:33 even more important than bitmap 2008-09-06 16:33 which is after all a redundant structure 2008-09-06 16:34 0: version 1: bitmap 2: extentmap 2008-09-06 16:34 maybe 2008-09-06 16:35 the rule is: inums below 0x10 do not have dirents 2008-09-06 16:35 they are special tux3 files never seen by user space 2008-09-06 18:09 ok, sk8 oclock, really 2008-09-06 18:51 hey 2008-09-06 19:54 hi bh 2008-09-06 20:44 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-06 20:44 -!- flips changed topic to "Tux3 list members just hit 100! ~ http://tux3.org" 2008-09-06 20:44 -!- flips changed topic to "Tux3 list membership just hit 100! ~ http://tux3.org" 2008-09-06 20:44 -!- flips changed mode/#tux3 -> -o flips 2008-09-06 23:17 -!- stargazr5(~gauravstt@59.95.22.62) has joined #tux3 2008-09-07 00:24 flips: ping 2008-09-07 00:24 pong 2008-09-07 00:24 hmm, my autoponger seems to be working 2008-09-07 00:25 heh 2008-09-07 00:25 so, tux3fs isn't working for me now 2008-09-07 00:26 whoops 2008-09-07 00:26 i upgraded my fuse 2008-09-07 00:26 let's see if it works for me 2008-09-07 00:26 but when i run tux3fs, it just hangs 2008-09-07 00:26 gdb is your friend 2008-09-07 00:26 # ./tux3fs /tmp/testdev /tmp/test -f 2008-09-07 00:26 devmap_blockio: read [2] 2008-09-07 00:26 devmap_blockio: read [3] 2008-09-07 00:26 lookup inode 0x0, 0 + 0 2008-09-07 00:26 mode 0000000 uid 0 gid 0 root 4:1 ctime 0 size 20 2008-09-07 00:26 lookup inode 0xd, 0 + d 2008-09-07 00:26 mode 0040755 uid 0 gid 0 root 8:1 2008-09-07 00:26 is this make testfs? 2008-09-07 00:27 well yeah, but run manually 2008-09-07 00:27 due to no sudo being installed 2008-09-07 00:27 ls -ld /tmp/test 2008-09-07 00:27 drwxr-xr-x 0 root root 0 Dec 31 1969 /tmp/test 2008-09-07 00:27 although, actually, it does work 2008-09-07 00:27 i just have to be root to see it 2008-09-07 00:28 so what is the hang? <- joke 2008-09-07 00:28 I see 2008-09-07 00:28 as a regular user i get 2008-09-07 00:28 ?--------- ? ? ? ? ? /tmp/test 2008-09-07 00:28 heh 2008-09-07 00:28 right 2008-09-07 00:28 until somebody fixes it 2008-09-07 00:28 ok i'll fix it 2008-09-07 00:28 I'm back to adding new grooviness to tux3 itself 2008-09-07 00:28 thanks 2008-09-07 00:29 post on the version table coming in a few minutes 2008-09-07 00:29 just had a bunch of caffeine, i'll be up a while 2008-09-07 00:29 heh 2008-09-07 00:29 fun 2008-09-07 00:29 I just played them demo of disney's new sick trick quad game, I'm kinda hyped 2008-09-07 00:29 it's good 2008-09-07 00:29 recommended 2008-09-07 00:29 video game? 2008-09-07 00:29 ridiculously over the top quad bike racing/tricks game 2008-09-07 00:29 yes 2008-09-07 00:30 eh 2008-09-07 00:30 you'd like it 2008-09-07 00:30 it's sick 2008-09-07 00:30 not much gore though 2008-09-07 00:30 i can't get an adrenaline rush from video games anymore 2008-09-07 00:30 kinda pointless 2008-09-07 00:30 probably would from this one 2008-09-07 00:31 if i want to entertain myself i can go and do stupid shit in real life 2008-09-07 00:31 .. the advantage of having a 180hp motorcycle ;) 2008-09-07 00:31 you can watch this, then go out and kill yourself completely 2008-09-07 00:31 it's pretty sylish 2008-09-07 00:32 funny thing is, people are actually doing tricks really close to what's inthe game 2008-09-07 00:32 the game was meant to be ridiculous 2008-09-07 00:32 well 2008-09-07 00:32 real quads don't have helicopters flying beside them at the top of a jump 2008-09-07 00:35 they could 2008-09-07 05:10 -!- stargazr5(~gauravstt@59.95.24.129) has joined #tux3 2008-09-07 05:14 -!- cdk(~chinmay@121.246.36.66) has joined #tux3 2008-09-07 05:19 -!- cdk(~chinmay@121.246.36.66) has left #tux3 2008-09-07 06:56 -!- dipanjan(~chatzilla@122.167.27.144) has joined #tux3 2008-09-07 09:42 -!- dipanjan(~chatzilla@122.167.27.144) has joined #tux3 2008-09-07 11:19 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-07 15:56 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-07 16:11 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-07 16:13 another big design note up 2008-09-07 16:14 in case anybody thinks these take longer to read that to write, I can assure you that is not the case ;-) 2008-09-07 16:16 http://slashdot.org/comments.pl?sid=954803&cid=24892667 <- warm n fuzzy 2008-09-07 16:20 flips: i still don't see your design note 2008-09-07 16:21 ah just delivered.. 2008-09-07 16:21 slow 2008-09-07 16:22 flips: i emailed a patch, and actually just commited a fix for a non-bug (yet) in it in my repo 2008-09-07 16:24 actually this is dumb: 2008-09-07 16:24 return inode ? 0 : -1; 2008-09-07 16:25 return -same as: 2008-09-07 16:25 return -!inode; 2008-09-07 16:27 you mean -!!inode? 2008-09-07 16:27 oh, nevermind 2008-09-07 16:30 back 2008-09-07 16:30 flips: how many bits for atom numbers? 2008-09-07 16:30 variable 2008-09-07 16:30 start with 16 bits for small ones 2008-09-07 16:31 and have a bigger variant 2008-09-07 16:31 with say 48 bits 2008-09-07 16:31 or 32 2008-09-07 16:31 don't have to get silly 2008-09-07 16:31 this may be painfully obvious 2008-09-07 16:31 but, why limit that approach to just xattrs? 2008-09-07 16:32 because nearly every file has a data attribute 2008-09-07 16:32 i was talking about this for sharing user, group, mode 2008-09-07 16:32 and mutliple versions of it 2008-09-07 16:32 ah 2008-09-07 16:32 of course 2008-09-07 16:32 bundle them all together 2008-09-07 16:32 what I meant in my post 2008-09-07 16:32 how bundle? 2008-09-07 16:33 i mean, put the user, group, and mode in the atom table too 2008-09-07 16:33 along with xattrs 2008-09-07 16:33 sure 2008-09-07 16:33 good idea 2008-09-07 16:33 separately or combined? 2008-09-07 16:33 the latter gets into content addressing 2008-09-07 16:33 probably combined 2008-09-07 16:33 more efficient that way 2008-09-07 16:34 more corner cases 2008-09-07 16:34 howso 2008-09-07 16:35 maxgid * maxuid * anymdode = huge 2008-09-07 16:36 yeah but that is a vrey unlikely case 2008-09-07 16:36 no reason to pay that max price for every file if you have only 10 or so on most systems 2008-09-07 16:36 how about a list post? 2008-09-07 16:36 yeah, working on it ;) 2008-09-07 16:36 recalling my one-liner from irc logs ;) 2008-09-07 16:37 find / -xdev -type d -exec sh -c 'ls -l $1 | awk "/^\-/ {print \$1}" | sort -u |wc -l' {} {} \; | sort | uniq -c 2008-09-07 16:37 heh 2008-09-07 16:37 another idea: record the actual user name in the fs, not just the id 2008-09-07 16:37 so that security will work when you copy a volume to a different system 2008-09-07 16:38 or at least some kind of mapping table 2008-09-07 16:38 which might well be an atom table 2008-09-07 16:39 shapor: what's that bit of shell do? 2008-09-07 16:39 using the atom table we could map 32 bit uid and gid down to 16 or even 8 bits 2008-09-07 16:39 its not quite what i wanted 2008-09-07 16:39 but it basically breaks down the number of unique permissions sets per directory on the system 2008-09-07 16:40 rather, it creates a distribution of uniqueness of permissions in a single directory 2008-09-07 16:41 shapor, is your tux3fs patch ready to go in, should I pull? 2008-09-07 16:42 yeah 2008-09-07 16:44 whoops, I got 3 heads somehow 2008-09-07 16:45 damn, there they are 2008-09-07 16:46 sitting in your repo, I did view but didn't look all the way down 2008-09-07 16:46 now how do you revert a pull 2008-09-07 16:46 I guess this is a big flaming deficiency in hg 2008-09-07 16:47 ok, hg rollback works 2008-09-07 16:47 shapor, could you merge your heads before I pull? 2008-09-07 16:48 how do i do that 2008-09-07 16:48 "hg merge" 2008-09-07 16:48 if it won't merge, tell it the version you want merged 2008-09-07 16:48 wish there was a simple delete head 2008-09-07 16:48 abort: there is nothing to merge, just use 'hg update' or look at 'hg heads' 2008-09-07 16:49 right, you need to tell it exactly what head 2008-09-07 16:49 start with raponen's patch you added independently 2008-09-07 16:49 that might have conflicts 2008-09-07 16:49 because I adjusted a little 2008-09-07 16:49 hg is broken here 2008-09-07 16:51 http://www.selenic.com/pipermail/mercurial/2006-September/010628.html "Delete a branch/head" 2008-09-07 16:51 hm ok 2008-09-07 16:51 merged my old heads 2008-09-07 16:51 now there is only one 2008-09-07 16:51 i think all is well with my repo 2008-09-07 16:52 "added 12 changesets with 11 changes to 3 files" 2008-09-07 16:53 so I got all your braindumps with that pull too ;-) 2008-09-07 16:53 well they are now carved in the history of tux3 for eternity 2008-09-07 16:54 "You can use either: 2008-09-07 16:54 hg clone -r oldrepo newrepo 2008-09-07 16:54 (safest method) 2008-09-07 16:54 or hg strip 2008-09-07 16:54 (does in repo stripping, you need mq extension activated to have the 2008-09-07 16:54 strip command)" http://www.selenic.com/pipermail/mercurial/2006-September/010628.html 2008-09-07 16:55 need to get in the habit of preparing a clean repo for pulling I think 2008-09-07 16:55 it can be automated 2008-09-07 16:55 or maybe In can tell pull to just pull the tip 2008-09-07 16:55 any reason for choosing hg over git? 2008-09-07 16:56 it's _way_ nicer to use 2008-09-07 16:56 git has a bunch of fuzzy thinking that nobody fixed just because it came from linus 2008-09-07 16:56 it's about exactly as fast, in spite of being written in super slow python 2008-09-07 16:57 I wasn't questioning the speed 2008-09-07 16:58 shapor, hg has pull -r, which I can use if I'm alert to see extra heads 2008-09-07 16:58 but I think the better approach is to clone the version you want me to pull, so I always pull from the same place, and you decide exactly what I pull 2008-09-07 16:58 knew you weren't 2008-09-07 16:58 alright :) 2008-09-07 16:58 it's just amazing that mercurial is as fast as it is, in spite of being hobbled by python 2008-09-07 16:59 matt mackall - amazing hacker 2008-09-07 16:59 heh 2008-09-07 17:00 I'd bet it either employs a lot of the stuff python is fast at (because underneath it's in C) or it employs its own C extensions 2008-09-07 17:00 python's dicts are really speedy, for example 2008-09-07 17:00 shapor, one thing I forgot to mention in my xattr post - we are going to use ext2 dirops for the atom table for the time being ;-) 2008-09-07 17:01 konrad, all of that 2008-09-07 17:01 plus it's a better design 2008-09-07 17:01 I bet it would kick git's tail if it were converted to c++ 2008-09-07 17:01 heh 2008-09-07 17:02 sk8 oclock 2008-09-07 17:02 whoo! 2008-09-07 17:02 enjoy 2008-09-07 17:02 we will 2008-09-07 17:02 going to meet shap for cocktails on the strand 2008-09-07 17:02 skating under the influence is fun and legal 2008-09-07 17:02 so far 2008-09-07 17:06 heh 2008-09-07 19:24 relatively legal anyway 2008-09-07 19:34 this channel is logged :) 2008-09-07 19:48 by me ;) 2008-09-07 20:07 heh 2008-09-07 20:07 me too, but it's not like I'm publicizing them 2008-09-07 20:07 konrad, completely legal 2008-09-07 20:07 unless of course you take out a baby buggy, just don't do that 2008-09-07 20:08 shapor, what say we put the logs up on tux3.org? 2008-09-07 20:09 and let googlebot chew on them 2008-09-07 20:09 help somebody learn something about skating maybe 2008-09-07 20:11 ok, howbig has to know how to include the size of variable sized items now 2008-09-07 20:12 I suppose easiest is to make the size of variable sized items a generic field 2008-09-07 20:13 something like [ kind:version:16, atom:16, size:16, data[size] } 2008-09-07 20:14 or for a file data attribute: [ kind:version:16, size:16, data[size] } 2008-09-07 20:14 hmm 2008-09-07 20:14 might be better to include the atom in the data 2008-09-07 20:14 { kind:version:16, size:16, atom:16 data[size - 2] } 2008-09-07 20:15 ok that seems better 2008-09-07 20:15 I'll make it so unless somebody has a better idea 2008-09-07 20:15 0x1dea <- great magic number, got to use it for something 2008-09-07 20:51 "howmuch" makes its grand return 2008-09-07 20:52 howbig -> size of fixed attributes, howmuch -> size of variable attributes 2008-09-07 22:30 flips: googlebot is already chewing on them 2008-09-07 22:31 they are linked off shapor.com/tux3 2008-09-07 22:31 ah good 2008-09-07 22:31 lets see what comes up for tux3 sk8 2008-09-07 22:31 nothing 2008-09-07 22:31 wtf is up with googlebot 2008-09-07 22:31 bad googlebot 2008-09-07 22:50 hm the lots are .txt, dunno if that has any effect 2008-09-07 23:13 flips: ah you pulled from me this time so i didn't have to merge my own changes, heh 2008-09-07 23:53 :-) 2008-09-08 00:13 hey 2008-09-08 00:13 how's it going ? 2008-09-08 00:14 ACTION just made it back to San Diego 2008-09-08 00:32 hey 2008-09-08 00:32 extended attributes are in process of being implemented 2008-09-08 00:32 only the disk format had any kind of design as of yesterday 2008-09-08 00:36 static struct xattrs foo = { .size = sizeof(foo), .list = { 2008-09-08 00:36 { .atom = 666, .size = 5, .data = "hello" }, 2008-09-08 00:36 { .atom = 777, .size = 6, .data = "world!" }, 2008-09-08 00:36 } }; 2008-09-08 02:46 the gates/seinfeld ad is awful, omgponies 2008-09-08 02:46 I couldn't keep watching it 2008-09-08 02:46 I've never seen anything that bad, not even close 2008-09-08 02:47 what were they thinking 2008-09-08 02:47 and how much did it cost ;-) 2008-09-08 02:48 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-08 02:54 agreed 2008-09-08 03:21 ext3 delete speed really is pathetic 2008-09-08 03:22 that's one thing we must do much better 2008-09-08 03:22 and we will 2008-09-08 03:22 going to be fun when we get to the transaction handling part 2008-09-08 03:50 I found newer old skates! 2008-09-08 03:50 these are only a half size smaller than what I currently wear 2008-09-08 03:50 I think 2008-09-08 03:51 :-) 2008-09-08 03:51 it's spreading 2008-09-08 03:51 isn't it a little late over there? 2008-09-08 03:51 or early? 2008-09-08 03:51 same timezone as you silly 2008-09-08 03:51 oh, I got the idea it was mn 2008-09-08 03:52 ah, comcast's rdns is broken 2008-09-08 03:52 they kinda fit 2008-09-08 03:55 it's late all the same 2008-09-08 03:55 I'd better crash 2008-09-08 03:55 xattr hacking is going slowly 2008-09-08 03:56 slowly is better than not at all 2008-09-08 03:57 true 2008-09-08 03:57 it's going to be good I think 2008-09-08 03:58 I expect better that average attriburte cache performance 2008-09-08 04:00 with this much effort invested I don't doubt it 2008-09-08 04:00 ok these skates kinda hurt my bones after a little wearing 2008-09-08 04:00 but so do ski boots so what's new 2008-09-08 04:00 heh, it's tiny compared to the effort invested in any other fs I know of 2008-09-08 04:01 you can get decent skates online for $200 2008-09-08 04:01 right 2008-09-08 04:01 that don't hurt 2008-09-08 04:01 but I have no income at present 2008-09-08 04:01 student 2008-09-08 04:01 and these aren't too bad 2008-09-08 04:01 ah, go crazy on your skates then 2008-09-08 04:01 been them up 2008-09-08 04:01 I'm busily destroying mine 2008-09-08 04:01 need new ones pretty soon 2008-09-08 04:02 heh 2008-09-08 04:02 I'll take em for a spin tomorrow 2008-09-08 04:02 seattle's kind of hilly for skates 2008-09-08 04:02 I used to bike a bit, which seems easier on hills 2008-09-08 04:05 static struct xattrs foo = { .blob = { 2008-09-08 04:05 { .code = 666, .size = 6, .data = "hello" }, 2008-09-08 04:05 { .code = 777, .size = 7, .data = "world!" }, 2008-09-08 04:05 } }; 2008-09-08 04:06 cache form of immediate xattrs 2008-09-08 04:06 how do we know how big .blob is? 2008-09-08 04:06 struct xattrs { unsigned size; struct xattr { u16 code, size; char data[]; } blob[]; } PACKED; 2008-09-08 04:06 we count it when loading the inode 2008-09-08 04:06 right, your struct above didn't mention size 2008-09-08 04:06 there is a .size field 2008-09-08 04:06 right 2008-09-08 04:06 because C is too braindamaged to calculated it 2008-09-08 04:07 mhm 2008-09-08 04:07 does very much the wrong thing when you ask it to do something reasonable 2008-09-08 04:07 the linux kernel isn't written in ocaml though 2008-09-08 04:07 static struct xattrs foo = { .size = sizeof(foo), .blob = { <- ought to work 2008-09-08 04:07 but it does not 2008-09-08 04:07 jw, what does it do? 2008-09-08 04:08 it uses offsetof(struct xattrs, blob) 2008-09-08 04:08 some wanking about flexible arrays 2008-09-08 04:08 ah 2008-09-08 04:08 but the compiler bloody initialized the thing to a certain size and should use that as sizeof 2008-09-08 04:09 flaw in C imho 2008-09-08 04:09 C or GCC? 2008-09-08 04:09 committee misfeature 2008-09-08 04:09 standard 2008-09-08 04:12 anyway: 2008-09-08 04:12 xattr 666: 0x804a37c: 68 65 6c 6c 6f 00 "hello." 2008-09-08 04:12 xattr 777: 0x804a386: 77 6f 72 6c 64 21 00 "world!." 2008-09-08 04:12 dump_xattrs: zero length xattr 2008-09-08 04:12 that's what it does if there is garbage in it 2008-09-08 04:14 the plan is to walk across the inode attrs filling in a vector with pointer to each xattr encountered 2008-09-08 04:15 then add up the sizes of all the xattrs, allocate memory big enough, and copy the xattr data into the cache struct shown above 2008-09-08 04:17 the xattr cache vector will be a binary sized multiple 2008-09-08 04:17 so there is some slack space, some of the time, to store more xattrs in it 2008-09-08 04:18 but no big deal to just realloc to the next binary size up when necessary 2008-09-08 04:18 better than a linked list 2008-09-08 04:19 immediatel file data probably better go in this cache struct too 2008-09-08 04:19 though it can possibly also go in the page cache 2008-09-08 04:19 it will go there 2008-09-08 04:20 but if the page gets evicted, I think we want to be able to go back to the in-memory inode to repopulate the page cache, rather than going all the way back to the inode table block 2008-09-08 04:20 this means there will be some triple caching of immediate file data: 1) in the inode table block 2) in the xattr cache 3) in the page cache 2008-09-08 04:21 seems a little excessive 2008-09-08 04:21 got to think about that 2008-09-08 04:22 we also have double caching of xattr data: 1) in the inode table block and 2) in the page cache 2008-09-08 04:22 don't really like that either 2008-09-08 04:22 sorry 2008-09-08 04:23 we also have double caching of xattr data: 1) in the inode table block and 2) in the inode's xattr cache 2008-09-08 04:23 what might make more sense is to pin the inode table block in memory and have the inode point into the block buffer 2008-09-08 04:24 so if the inode is evicted by the vm, it drops its count on the inode table block, which may now be evicted if no other inode holds a count on it 2008-09-08 04:25 if we do that, xattr caching looks maybe more sensible 2008-09-08 04:26 if attributes are heavily versioned though, we may pin a log of inode table blocks in memory, which hold a very low density of data we are actually using 2008-09-08 04:26 --sensible 2008-09-08 04:27 I am probably overworried about this double/tripe caching issue 2008-09-08 04:27 we have that anyway with existing inode attributes 2008-09-08 04:29 the triple caching of immediate data can be improved to just double caching by not loading immediate file data into the attribute cache. I dunno. 2008-09-08 04:29 instead, we let a miss in the page cache to retrieve the inode table block and load the immediate data into the page cache 2008-09-08 04:30 ok, this is right 2008-09-08 04:31 then when we update the inode on disk, we store an immediate data attribute only if the page cache is a) dirty and b) still small enough to be an immediate data attribute 2008-09-08 07:17 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-08 07:17 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-08 07:17 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-08 07:17 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-08 07:17 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-08 07:17 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-08 07:17 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-08 07:17 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-09-08 09:00 -!- stargazr5(~gauravstt@59.95.36.185) has joined #tux3 2008-09-08 10:07 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-08 11:24 so where were we 2008-09-08 11:24 xattrs 2008-09-08 11:25 so we are going to have this little xattr cache hanging off each inode, which is just a binary size chunk of memory a la kmalloc 2008-09-08 11:25 and xattrs get loaded into it when the inode is loaded, and written into it when somebody sets an xaddr, marking the inode 'xattr dirty' 2008-09-08 11:26 meaning the xattr cache has to be written back to the inode table block when the inode is saved/synced 2008-09-08 11:27 xattr atoms are stored on disk in directory format, specifically ext2 directory format for now. So we do an ext2_find_entry to resolve an xattr name to an atom on sys_getxattr or sys_setxattr 2008-09-08 11:28 then search the inode's xattr cache for that atom 2008-09-08 11:29 if ext2_find_entry fails for sys_setxattr, then we do ext2_create_entry 2008-09-08 11:30 for now, we are always going to load all xattrs when an inode is loaded and write all xattrs when the inode is saved 2008-09-08 11:30 later, especially with versioning, we don't want to load all xattrs every time 2008-09-08 11:31 so getxattr will first search the cache, then go to the inode table block if that fails 2008-09-08 11:31 and setxattr will have to scan the xattrs present in the inode to know which ones to keep and which ones to overwrite on save 2008-09-08 11:32 the new size of a saved inode will then not be completely determinable from examining the inode alone, as it is now 2008-09-08 11:32 so we get a little more complexity here, not too bad 2008-09-08 11:33 the initial implementation of load all, save all, no versioning will be pretty simple and fast 2008-09-08 11:35 always calling ext2_find_entry for each xaddr atom can be avoided by keeping a hash of xattr atoms, and we only do the find_entry on a miss in the xattr atom hash, or we always keep the xattr hash fully populated for now (there are usually only a few different kinds of xattrs) 2008-09-08 11:43 -!- elicriffield(~elicriffi@66.249.86.209) has joined #tux3 2008-09-08 11:58 hey 2008-09-08 12:07 hey 2008-09-08 12:09 hi eli 2008-09-08 12:09 it's xattrs day today ;-) 2008-09-08 12:09 oh fun :) 2008-09-08 12:09 yah, not the most exciting, exciting to some folks though 2008-09-08 12:10 man im just trying to stay awake today 2008-09-08 12:10 I guess a reasonable approach is emerging, should have something working by tomorrow say 2008-09-08 12:10 right, it's one of those for me too 2008-09-08 12:13 make iattr && ./iattr 2008-09-08 12:13 gcc -std=gnu99 -Wall -g iattr.c -o iattr 2008-09-08 12:13 xattr 666: 0x804a37c: 68 65 6c 6c 6f 00 "hello." 2008-09-08 12:13 xattr 777: 0x804a386: 77 6f 72 6c 64 21 00 "world!." 2008-09-08 12:13 two xattrs 2008-09-08 12:13 in an xcache ;-) 2008-09-08 12:13 now to wrap that with lots of xattr access yumminess 2008-09-08 12:20 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-08 12:28 ACTION reads shapors diversity post 2008-09-08 13:04 -!- SEJeff(~jeff__@66.151.59.138) has joined #tux3 2008-09-08 13:42 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-08 14:37 flips: 2008-09-08 14:37 You don't even need to play with autoconf, just do 2008-09-08 14:37 #define FUSE_USE_VERSION 26 2008-09-08 14:37 #include 2008-09-08 14:37 ... 2008-09-08 14:37 #if FUSE_VERSION >= 26 2008-09-08 14:38 fuse_main(argc, argv, &my_op, NULL); 2008-09-08 14:38 #else 2008-09-08 14:38 fuse_main(argc, argv, &my_op); 2008-09-08 14:38 #endif 2008-09-08 14:38 But all this is only important if you need some API features from 2008-09-08 14:38 2.6.x/2.7.x. Otherwise you can just use the old API unconditionally: 2008-09-08 14:38 #define FUSE_USE_VERSION 25 2008-09-08 14:38 #include 2008-09-08 15:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-08 16:56 ok, xattr cache lookup works 2008-09-08 16:56 konrad, we now do #define FUSE_USE_VERSION 27 2008-09-08 16:56 aj 2008-09-08 16:57 but good suggestion 2008-09-08 16:57 nice solution 2008-09-08 16:57 alright 2008-09-08 16:57 I should have xattr support in tux3.c by tomorrow 2008-09-08 16:58 then we can try it out in fuse 2008-09-08 16:58 fun 2008-09-08 16:58 I don't think we have functions for that yet 2008-09-08 16:58 not in tux3.c 2008-09-08 16:58 writing them now 2008-09-08 16:58 or rather, not in inode.c 2008-09-08 16:59 I meant in tux3fs.c 2008-09-08 16:59 maybe tux3fuse.c but I havn't looked at it much 2008-09-08 16:59 xcache_dump and xcache_lookup work, now writing xcache_update, which is considerably harder 2008-09-08 16:59 I wonder what passes for an xattr delete 2008-09-08 17:00 setxattr to empty? 2008-09-08 17:00 the low level fuse api has it 2008-09-08 17:14 alright 2008-09-08 17:15 touch: setting times of `tmp/abc': Function not implemented 2008-09-08 17:15 (high level api) 2008-09-08 17:17 I know 2008-09-08 17:17 don't know what it is 2008-09-08 17:17 sniffed at it a little 2008-09-08 17:17 libc braindamage it seems 2008-09-08 17:18 triggered by some combination of fuse things 2008-09-08 17:19 konrad, I suggest asking on the fuse list 2008-09-08 17:24 hm? I think we just havn't implemented one of the functions fuse wants us to 2008-09-08 17:24 might be 2008-09-08 17:25 Function 'name' not implemented would be much more informative 2008-09-08 17:31 hey tim_dimm 2008-09-08 17:31 hey flips 2008-09-08 17:35 shapor: ping 2008-09-08 17:40 xcache_delete works 2008-09-08 17:48 xcache_update works 2008-09-08 17:49 now some memory management to take care of changing the size of the xcache as necessary 2008-09-08 17:49 but first, a skate 2008-09-08 17:50 it's looking quite like the tux3 command will have get/set xattr by tomorrow 2008-09-08 19:43 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-08 19:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-08 22:53 time for a checkin 2008-09-08 23:16 been 30 hours :( 2008-09-08 23:33 folks 2008-09-09 00:04 yo 2008-09-09 00:17 ACTION starts writing encode_xattrs 2008-09-09 00:24 hmm, issue 2008-09-09 00:24 what endianness of xattr data? 2008-09-09 00:25 I suppose the filesystem does not care 2008-09-09 00:26 if the application cares, it better take care of it 2008-09-09 00:28 flips: unlink a 0 length file causes an infinite loop heh 2008-09-09 00:28 heh 2008-09-09 00:29 fixed yet? 2008-09-09 00:29 free <- [ffffdddddddddddd] 2008-09-09 00:29 bfree: block 0xffffdddddddddddd already free! 2008-09-09 00:29 free <- [ffffdddddddddddd] 2008-09-09 00:29 bfree: block 0xffffdddddddddddd already free! 2008-09-09 00:29 ACTION thinks about why that might be 2008-09-09 00:29 those dddddddd are uninited data 2008-09-09 00:29 shapor: that happens to me a lot 2008-09-09 00:29 bad 2008-09-09 00:29 you should complain more 2008-09-09 00:30 :D 2008-09-09 00:30 does tree_chop do the right thing if there is no data int he btree? 2008-09-09 00:30 hm 2008-09-09 00:30 and by 'that' I mean the infinite loop with the exact same values 2008-09-09 00:30 or is the way we're creating files in fuse broken 2008-09-09 00:31 /* let's clear out the buffer array and data and set to deadly data 0xdd */ 2008-09-09 00:31 memset(data_pool, 0xdd, max_buffers*bufsize); 2008-09-09 00:31 do you want me to fix it, or do you want to sniff first? 2008-09-09 00:31 it's deep in the poo 2008-09-09 00:32 no question its a tux3 bug 2008-09-09 00:32 i suspected fuse 2008-09-09 00:32 the d's above prove it 2008-09-09 00:32 hm 2008-09-09 00:32 should be able to reproduce easily either with tux3.c or inode.c 2008-09-09 00:33 hack the test a little 2008-09-09 00:33 starts out like this: 2008-09-09 00:33 ---- delete file ---- 2008-09-09 00:33 lookup inode 0x21, 0 + 21 2008-09-09 00:33 open_inode: found inode 0x21 2008-09-09 00:33 mode 0100666 uid 0 gid 0 root 21:1 2008-09-09 00:33 free <- [0] 2008-09-09 00:33 free <- [0] 2008-09-09 00:33 bfree: block 0x0 already free! 2008-09-09 00:33 free <- [0] 2008-09-09 00:33 bfree: block 0x0 already free! 2008-09-09 00:34 ffffdddddddddddd <- I'm surprised the allocator manages to deal with this 2008-09-09 00:34 that repeats for a while 2008-09-09 00:34 then 2008-09-09 00:34 filemap_blockio: read <0:bbbbbbbb> 2008-09-09 00:34 filemap_blockio: unmapped block bbbbbbbb 2008-09-09 00:34 free <- [ffffdddddddddddd] 2008-09-09 00:34 got to check some limits there 2008-09-09 00:34 bfree: block 0xffffdddddddddddd already free! 2008-09-09 00:34 and the loop begins 2008-09-09 00:34 it's possibly trying to treat a block of zeros a dleaf 2008-09-09 00:35 or it did a getblk where it should have done a bread 2008-09-09 00:35 something :-) 2008-09-09 00:35 is this the first tux3 bug fuse has found? 2008-09-09 00:35 think so 2008-09-09 00:36 actually, fuse found a couple bugs in dir.c 2008-09-09 00:40 ah i see 2008-09-09 00:40 so soon? 2008-09-09 00:40 reproduced the bug by adding a test case to dleaf.c 2008-09-09 00:40 and running make dleaftest 2008-09-09 00:40 your favorite file 2008-09-09 00:41 empty dleaf is the culprit 2008-09-09 00:41 and the fix? 2008-09-09 00:42 fix? 2008-09-09 00:42 its more fun to break! 2008-09-09 00:42 :P 2008-09-09 00:42 so we allocate a dleaf even for a 0 length file? 2008-09-09 00:43 that is something that will be optimized away right? 2008-09-09 00:44 143 /* 2008-09-09 00:44 144 * Reasons this dleaf truncater sucks: 2008-09-09 00:44 yes 2008-09-09 00:44 haha 2008-09-09 00:44 it will be optimized away 2008-09-09 00:44 good thing we aren't doing premature optimization 2008-09-09 00:44 very good 2008-09-09 00:45 bug wouldnt be noticed 2008-09-09 00:45 flips: were you high when you wrote this? 2008-09-09 00:45 quite 2008-09-09 00:45 konrad was complicit 2008-09-09 00:46 so what is the condition it doesn't handle? 2008-09-09 00:46 heh 2008-09-09 00:46 empty dleaf, but initialized? 2008-09-09 00:46 where did those ddddddd's come from? 2008-09-09 00:46 nah it just overflowed 2008-09-09 00:47 to some other area 2008-09-09 00:47 at first it was all zeros 2008-09-09 00:48 flips: in general how are we going to deal with corruption 2008-09-09 00:48 we need a lot more integrity checking 2008-09-09 00:48 it will slow it down 2008-09-09 00:48 yes 2008-09-09 00:48 that's life 2008-09-09 00:49 the rule is: you should be able to randomize any block and not cause an oops 2008-09-09 00:49 everything has to fail on nasty random disk data 2008-09-09 00:49 definitely 2008-09-09 00:49 filesystems are hard 2008-09-09 00:49 that can be pretty lightweight checking 2008-09-09 00:49 see dir.c 2008-09-09 00:49 very mature code 2008-09-09 00:49 has obviously gone through a lot of fixes in that regard 2008-09-09 00:50 not nice code mind you 2008-09-09 00:50 just heavily fixed 2008-09-09 00:51 so i was able to find that bug anyway 2008-09-09 00:51 because "touch file" is working in my repo :) 2008-09-09 00:52 :-) 2008-09-09 00:53 well sort-of working 2008-09-09 00:53 the mtime is wrong 2008-09-09 00:53 expected 2008-09-09 00:54 ah duh 2008-09-09 00:54 is getaatr just broken? 2008-09-09 00:54 hm 2008-09-09 00:54 could be 2008-09-09 00:54 i'm calling store_inode after setting the i_mtime 2008-09-09 00:54 er save_inode 2008-09-09 00:55 it does seem to be doing it by looking at the debug info 2008-09-09 00:55 huh, the xattr encoder seems to work 2008-09-09 00:55 you also have to set the ->present bit for the mtime 2008-09-09 00:56 I don't think anything does that for you 2008-09-09 00:58 i thought i saw something in inode.c 2008-09-09 00:58 hrm 2008-09-09 00:59 way back 2008-09-09 00:59 then it became discretionary 2008-09-09 00:59 without a test case ;-) 2008-09-09 00:59 oh 2008-09-09 00:59 tisk tisk 2008-09-09 01:12 ? 2008-09-09 01:12 encode_xattrs is functional, now for decode 2008-09-09 01:14 flips: you can pull fuse utime support + dealf bug test from me 2008-09-09 01:14 ok, need a place to decode the xattrs to 2008-09-09 01:14 hmm 2008-09-09 01:14 kay 2008-09-09 01:14 no fix yet, been dicking with the mtime issue 2008-09-09 01:14 you just want to send your cruft over? 2008-09-09 01:14 oh 2008-09-09 01:14 utime 2008-09-09 01:14 its annoying me that everything is 1969 2008-09-09 01:15 epoch 2008-09-09 01:15 we need the logic to make the timestamps default to eachother 2008-09-09 01:15 but thats not even the issue 2008-09-09 01:15 even the ctime is broken 2008-09-09 01:15 have you thought about setting up a dedicated repo for me to pull from, into which you only pull one head? 2008-09-09 01:15 or do you want your dicking around recorded for posterity ;-) 2008-09-09 01:16 theres only one head in my repo 2008-09-09 01:16 because you merged 2008-09-09 01:16 but all your merges show up over here 2008-09-09 01:16 ah, annoying 2008-09-09 01:16 why is hg such a pita 2008-09-09 01:16 unless you do that version specific pull 2008-09-09 01:16 this is supposed to just work 2008-09-09 01:16 git is identical 2008-09-09 01:17 well its not really wrong 2008-09-09 01:17 cant you cherry pick my change? 2008-09-09 01:17 yes 2008-09-09 01:17 but its time consuming 2008-09-09 01:17 should be oneliner 2008-09-09 01:17 much more efficient for you to just pull your time to a dedicated repo. that can be automatic 2008-09-09 01:17 pull your tip 2008-09-09 01:17 well 2008-09-09 01:18 not much difference 2008-09-09 01:18 try hg view 2008-09-09 01:18 you'll see all the extra stuff 2008-09-09 01:20 shapor, the times are all broken because they are never set 2008-09-09 01:20 you can see where to set them in inode.c 2008-09-09 01:20 there's a gettime available I think 2008-09-09 01:20 http://www.selenic.com/mercurial/wiki/index.cgi/CommunicatingChanges#line-63 2008-09-09 01:20 would that be easier? 2008-09-09 01:21 doesn't decode_attrs set them in struct inode? 2008-09-09 01:21 gets called in open_inode 2008-09-09 01:22 they never get set to an actual time 2008-09-09 01:23 they should now with my utime code though 2008-09-09 01:23 not sure why its not working 2008-09-09 01:23 hrm 2008-09-09 01:23 too many interfaces 2008-09-09 01:24 bah 2008-09-09 01:24 way too many 2008-09-09 01:24 inode->i_mtime = inode->i_ctime = inode->i_atime = iattr->mtime; ><- this is where to set the time 2008-09-09 01:24 in inode.c 2008-09-09 01:25 I thought some of the times were stored differently (lower resolution) 2008-09-09 01:25 brings up the question: what is the format of a tux3 time? 2008-09-09 01:25 yeah 2008-09-09 01:25 yes 2008-09-09 01:25 so, I'd like to get away from the traditional decimal encoding 2008-09-09 01:25 were the fraction is millionths or billionths 2008-09-09 01:25 and use strictly binary 2008-09-09 01:26 that means multiplying and dividing to convert to the braindamaged linux format 2008-09-09 01:26 not sure whether this is wise 2008-09-09 01:27 hrm 202 u64 i_size, i_mtime, i_ctime, i_atime; 2008-09-09 01:27 u64? 2008-09-09 01:28 that's just in the inode, the memory version 2008-09-09 01:28 on disk it gets sqzed, maybe 2008-09-09 01:28 yeah but still 2008-09-09 01:28 whats the point of u64 in memory even? 2008-09-09 01:28 u32 is too small 2008-09-09 01:28 and c doesn't play well with others 2008-09-09 01:28 so we have our own format? 2008-09-09 01:29 we have to use the linux format there 2008-09-09 01:29 when we got to kernel 2008-09-09 01:29 hm 2008-09-09 01:29 because those are in the generic part of the inode 2008-09-09 01:29 i'm getting tired 2008-09-09 01:29 I haven't divided up the inode into generic and filesystem specific parts yet 2008-09-09 01:29 i think my changes are crap, might not want to merge them 2008-09-09 01:29 they dont add much 2008-09-09 01:29 other than sort-of working touch 2008-09-09 01:29 sleep on it 2008-09-09 01:29 fly away 2008-09-09 01:29 hack onthe plane 2008-09-09 01:30 see how nicely encode_xattrs came out 2008-09-09 01:30 in iattr.c 2008-09-09 01:31 decode_xattrs is a little messier because we have to guess how big to make the cache 2008-09-09 01:31 I suppose I'd better run a size-guessing pass first to know the size for thecache 2008-09-09 02:30 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-09 02:32 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-09 02:52 development is going pretty well 2008-09-09 03:02 I'd say 2008-09-09 04:42 konrad, your fuse post got linked from lwn.net: http://lwn.net/Articles/297308/ 2008-09-09 04:42 jesus 2008-09-09 04:43 night 2008-09-09 04:43 night 2008-09-09 04:55 where'd that get linked from? 2008-09-09 06:31 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-09 12:17 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-09 12:41 nice 2008-09-09 14:22 ACTION makes another cuppa before pulling the trigger on the "one less feature" psot 2008-09-09 14:22 post 2008-09-09 14:41 flips: by depending so much on LVM, aren't you limiting the OSs on top of tux3 can be run? What if I want to use tux3 with Solaris, FreeBSD or HP-UX? 2008-09-09 14:42 by "limiting the OSs" I mean "it will require a lot of work to make tux3 work with non-Linux OSs" 2008-09-09 14:43 they will have a lot of work anyway 2008-09-09 14:43 gpl license is incompatible with all those 2008-09-09 14:43 but if tux3 catches on in linux then they will just have to do it 2008-09-09 14:44 pgquiles, you mean to say you already read the one less feature post? 2008-09-09 14:46 ACTION thinks about making it an early sk8 day 2008-09-09 14:46 flips: yes 2008-09-09 14:47 anyway, this isn't more than people already depend on lvm 2008-09-09 14:47 ah, ok 2008-09-09 14:48 what I don't want to do is depend on an lvm built into a filesystem, maintained by a very small group of developers and used by only one application 2008-09-09 14:48 it's bad enough depending on my generic btree code ;-) 2008-09-09 14:49 :-) 2008-09-09 14:50 I'm really impressed at how much you do in so few lines of code 2008-09-09 14:50 easy when you leave out the error handling :-) 2008-09-09 14:50 well 2008-09-09 14:51 you make filesystem development sound easy, like "hey, this morning I feel like I'm going to write my filesystem" :-) 2008-09-09 14:51 trying to put some of that in too 2008-09-09 14:51 that's how I felt 6 weeks ago 2008-09-09 14:51 thought it could be prototyped in 2 weeks 2008-09-09 14:51 I was wrong 2008-09-09 14:51 looks like 8 weeks 2008-09-09 14:52 and that is only with the unexpected help of fuse 2008-09-09 14:52 and of course with the help of all the helpers 2008-09-09 14:52 are seem to be increasing exponentially 2008-09-09 14:53 who seem I mean 2008-09-09 14:54 timothy has a baby girl 2008-09-09 14:54 newborn? 2008-09-09 14:56 they are very nice while in the poop-machine age 2008-09-09 14:57 when teeth show... they cry all the time :-) 2008-09-09 14:58 newborn indeed 2008-09-09 14:58 it's worth the effort 2008-09-09 14:59 it is, indeed 2008-09-09 15:00 until they are 12 years old and start behaving like young terrorists :-D 2008-09-09 15:00 lkml is running slow today 2008-09-09 15:05 bedtime 2008-09-09 15:07 bye 2008-09-09 15:07 adios 2008-09-09 15:44 -!- kbingham(~kbingham@92.8.9.246) has joined #tux3 2008-09-09 15:44 finally showed up on lkml: http://lkml.org/lkml/2008/9/9/402 2008-09-09 15:45 "Tux3 Report: One less feature" 2008-09-09 16:15 new file 2008-09-09 16:15 xattr.c 2008-09-09 16:16 will skate first then make atom resolution actually work 2008-09-09 16:25 a quick q: is it ok to submit some patches to allow tux3 to compile on mac? :P 2008-09-09 16:34 if you can work out the license issues 2008-09-09 16:34 not sure how you'd do that 2008-09-09 16:34 compile under a linux emulator maybe 2008-09-09 16:36 I'm only interested in running tux3 in userspace 2008-09-09 16:36 does this conflict with any license? 2008-09-09 16:37 not that I know of 2008-09-09 16:38 good :D 2008-09-09 16:38 sounds like a cool project 2008-09-09 16:39 the fuse lib you link with will need to be gpl 2008-09-09 16:39 or compatible 2008-09-09 16:39 gpl v3 compatible, which is a little easier than v2 2008-09-09 16:39 I don't want to link to fuse actually 2008-09-09 16:40 directly using kernel calls is traditionally ok 2008-09-09 16:40 you're probably using libc, right? 2008-09-09 16:40 I think macos uses libc 2008-09-09 16:40 that's compatible 2008-09-09 16:40 true 2008-09-09 16:40 some things are not quite the same 2008-09-09 16:41 the libc is deeply embedded in mac 2008-09-09 16:41 I can imagine 2008-09-09 16:41 obviously, tux3 will really need forks now ;-) 2008-09-09 16:41 tux3 xattrs are intended to be forks as well 2008-09-09 16:41 one cool thing I want to accomplish to be able to run an FTP server to expose the tux3 ;-) 2008-09-09 16:42 don't be in a big rush to do that unless you are great at debugging 2008-09-09 16:42 well... I am actually attempting this for some linux fs anyway 2008-09-09 16:43 doing it also for tux3 fits well :D 2008-09-09 16:43 ftp load is fairly forgiving 2008-09-09 16:43 read mostly 2008-09-09 16:44 ok, I'd better get my skate in 2008-09-09 16:44 promised to do a tux3 university session tonight at 8 2008-09-09 16:46 amazing how many more penis extension spams I get when I am actively posting about Tux3, I wonder if there is a relation 2008-09-09 16:46 8pm? (like in 15 minutes?) 2008-09-09 16:47 ohh... the other coast 2008-09-09 16:47 right 2008-09-09 16:48 great! we have time to go home and eat till then :P 2008-09-09 16:49 do we need to review some material before showing up? :D 2008-09-09 16:50 optional 2008-09-09 16:50 if you have time, read "understanding the linux kernel" and "linux device drivers" ;-) 2008-09-09 16:50 haha 2008-09-09 16:50 understanding is on my right, the ldd is at home :P 2008-09-09 16:51 I found the "understanding..." more useful so that's why I hauled it to school 2008-09-09 16:52 linux device drivers will grow on you 2008-09-09 16:52 this 3rd edition really put some weight 2008-09-09 16:52 it's the deeper of the two 2008-09-09 16:52 lets see which edition I have 2008-09-09 16:52 LDD is the first one I got in touch actually 2008-09-09 16:52 the first edition 2008-09-09 16:52 some years ago :D 2008-09-09 16:53 3rd, and I still find it on the superficial side 2008-09-09 16:53 for fs stuff or in general? 2008-09-09 16:57 in general 2008-09-09 16:57 vfs, vm, whatever 2008-09-09 16:58 :D 2008-09-09 17:31 damn... looks like 8pm your time is about 4am in uk :S 2008-09-09 17:33 i'll have to be a passive observer in the morning :) 2008-09-09 18:46 back 2008-09-09 18:46 what's wrong with being up at 4 a.m? 2008-09-09 19:11 mmm, a little sake and sushi to get focussed 2008-09-09 19:11 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-09 19:50 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-09 19:55 who's here and awake? 2008-09-09 19:56 I am ;-) 2008-09-09 19:56 that's a quorum 2008-09-09 19:56 mmm, sake and sushi... 2008-09-09 19:57 was good 2008-09-09 19:57 well 2008-09-09 19:57 I better heat up another one 2008-09-09 19:58 ACTION is also awake 2008-09-09 19:59 got a browser ready? 2008-09-09 20:00 always... 2008-09-09 20:00 lxr? :P 2008-09-09 20:00 http://lxr.linux.no/linux <- ok, open this 2008-09-09 20:00 of course 2008-09-09 20:00 as expected 2008-09-09 20:00 -!- RalucaME(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-09 20:00 -!- nataliep(~nataliep@72.14.224.1) has joined #tux3 2008-09-09 20:00 hi natalie 2008-09-09 20:00 hi dan 2008-09-09 20:01 max, have you met natalie? 2008-09-09 20:01 maze? 2008-09-09 20:01 flips: http://lxr.linux.no/linux <- ok, open this 2008-09-09 20:01 hmm? 2008-09-09 20:01 http://lxr.linux.no/linux+v2.6.26.5/fs/open.c#L1106 <- let's start with sys_open 2008-09-09 20:02 everybody see it? 2008-09-09 20:02 yes - we should perhaps first ask though how many people are listnening 2008-09-09 20:02 ACTION nods 2008-09-09 20:02 ACTION nods too 2008-09-09 20:02 3 is fine with me 2008-09-09 20:02 ACTION nods sagely 2008-09-09 20:02 nods :) 2008-09-09 20:03 it's logged anyway 2008-09-09 20:03 true 2008-09-09 20:03 ok, every syscall in linux starts with sys_ 2008-09-09 20:03 and continues with the name you get from man 2008-09-09 20:03 so man 2 open 2008-09-09 20:03 all it does is a little linkage 2008-09-09 20:04 then real action starts in do_sys_open 2008-09-09 20:04 so lets go there by clicking on it 2008-09-09 20:04 and click on the Function link 2008-09-09 20:04 http://lxr.linux.no/linux+v2.6.26.5/fs/open.c#L1084 2008-09-09 20:04 why isn't sys_open isn't just a call to sys_openat(AT_FDCWD, ...)? 2008-09-09 20:04 we're still in the same file 2008-09-09 20:04 good question 2008-09-09 20:05 ask al viro and add some epithet on the end ;-) 2008-09-09 20:05 can sys_* never call sys_* ? 2008-09-09 20:05 or is this something that could be cleaned up? 2008-09-09 20:05 syscalls use a weird linkage 2008-09-09 20:05 gcc and do it, but its odd 2008-09-09 20:05 can do it 2008-09-09 20:05 sys_creat calls sys_open 2008-09-09 20:05 so it could probably be replaced 2008-09-09 20:06 I open call sys_ functions from deep in kernel 2008-09-09 20:06 often 2008-09-09 20:06 let me recant 2008-09-09 20:06 syscalls _sometimes_ use a weird linkage 2008-09-09 20:06 so yes you could nest them 2008-09-09 20:06 al doesn't for no reason I know 2008-09-09 20:07 it's like "yuck, is a nasty top level entry point" 2008-09-09 20:07 gcc and do it, but its odd - what did you mean? 2008-09-09 20:07 I was rambling 2008-09-09 20:07 ;-) 2008-09-09 20:07 the weird stuff happens before we even get there 2008-09-09 20:07 in the syscall table 2008-09-09 20:08 so by the time we hit sys_* we're in pure C land? 2008-09-09 20:08 #ok, so we're in a different address space than we were a nanosceond 2008-09-09 20:08 yes 2008-09-09 20:08 usually 2008-09-09 20:08 some syscalls have strange register linkage 2008-09-09 20:08 anyway the vfs doesn't much care about that 2008-09-09 20:08 it gets away from syscall land as soon as it can 2008-09-09 20:09 what we are going to see, is a lot of messing around with user addresses 2008-09-09 20:09 because a nanosecond ago or so, we were in processor ring 3 2008-09-09 20:09 userspace 2008-09-09 20:09 now we're in ring 0 2008-09-09 20:09 different address space 2008-09-09 20:09 kind of 2008-09-09 20:10 different priveledge level 2008-09-09 20:10 that too 2008-09-09 20:10 everthing is a little different 2008-09-09 20:10 kind of like the twilight zone 2008-09-09 20:10 we're on the inside of the glass looking out now, like in that harry potter movie 2008-09-09 20:10 ok 2008-09-09 20:10 so we have to get the name for the open 2008-09-09 20:10 it's in a different address space 2008-09-09 20:11 so we do copy_from_user to get it 2008-09-09 20:11 just looking for the def of getname 2008-09-09 20:11 it's not very interesting actually 2008-09-09 20:11 it stores the name on a full page of kernel memory 2008-09-09 20:12 or it used to 2008-09-09 20:12 now I see we use a kmem cache for it 2008-09-09 20:12 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L141 2008-09-09 20:12 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L1615 2008-09-09 20:12 thanks 2008-09-09 20:12 and an audit hook 2008-09-09 20:13 things change around in here fairly frequently 2008-09-09 20:13 it's usually worth starting from the top in lxr every time 2008-09-09 20:13 just so you can check for details that changed 2008-09-09 20:13 by the top you mean all the way from sys_open or some other top? 2008-09-09 20:13 that audit thingy is new 2008-09-09 20:13 right 2008-09-09 20:14 like I said, getname is boring 2008-09-09 20:14 let's go back to do_ _open 2008-09-09 20:14 perhaps, another question: how much of a fs is driven by userspace triggered syscalls? 2008-09-09 20:14 nearly all of it 2008-09-09 20:14 particularly for traditional fs's 2008-09-09 20:14 new ones tend to have some daemons helping 2008-09-09 20:15 generally, the more daemons, the less reliable 2008-09-09 20:15 which are effectively kernel threads doing syscalls? 2008-09-09 20:15 not doing syscalls 2008-09-09 20:15 using internal interfaces 2008-09-09 20:15 using syscalls internally sucks, because of being in the wrong address space 2008-09-09 20:15 the syscall expects to get its data from userspace 2008-09-09 20:15 oh, right the copy_from_user stuff 2008-09-09 20:15 right 2008-09-09 20:16 anyway you're using syscalls internally, something linux is broken 2008-09-09 20:16 or you're stupid 2008-09-09 20:16 heh 2008-09-09 20:16 about 50/50 2008-09-09 20:16 the next interesting place is do_filp_open 2008-09-09 20:17 lxr is a little funky indexing some of these 2008-09-09 20:17 would do_filp_open be the kernel-internal interface to open? 2008-09-09 20:18 factoring is a little arbitrary 2008-09-09 20:18 [ie. would this be what you would call from above mentioned kernel threads/daemons?] 2008-09-09 20:18 it's another helper that happens to do almost all the work 2008-09-09 20:18 yes you can 2008-09-09 20:18 if its not static, then something is using it 2008-09-09 20:18 often something bogus 2008-09-09 20:18 something external that is 2008-09-09 20:18 not statie = part of kernel api 2008-09-09 20:19 often unwisely ;-) 2008-09-09 20:19 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L1761 2008-09-09 20:19 I wish lxr was smarter about finding defs of extern functions 2008-09-09 20:19 I went to usage, then to the first reference onthe list 2008-09-09 20:19 now things are happening 2008-09-09 20:20 [actually filp_open seems to be the kernel-interface, not that it much matters] 2008-09-09 20:20 that is true 2008-09-09 20:20 see "arbitrary factoring" above 2008-09-09 20:20 it's kind of a pile is some ways ;-) 2008-09-09 20:20 in other ways it's beautiful 2008-09-09 20:20 only about 3 of those ;-) 2008-09-09 20:21 we'll get to some scary code now 2008-09-09 20:21 do_filp_open is pretty big... 2008-09-09 20:21 path_lookup_open 2008-09-09 20:22 it;'s big because it's implementing all of unix semantics + all of linux semantics + historical cruft + arcane voodooism nobody is quite sure about 2008-09-09 20:22 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L1238 2008-09-09 20:23 so the vfs layer does permissions checking... not the fs itself? 2008-09-09 20:23 we're going to stay away from path lookup to avoid brain damage 2008-09-09 20:23 that is correct 2008-09-09 20:23 the vfs checks permissions and does a lot of locking too 2008-09-09 20:23 also implements the namespace caching 2008-09-09 20:23 it does a huge amount of work 2008-09-09 20:24 what's namespace caching? 2008-09-09 20:24 dentry cache 2008-09-09 20:24 every time you open a file, linux creates a dentry for the name 2008-09-09 20:24 that lives in cache 2008-09-09 20:24 dentry points at inode 2008-09-09 20:24 so what does the dentry cache map between? 2008-09-09 20:24 dentries are pretty big, inodes are pretty big memory structures too 2008-09-09 20:25 filename and inode (possibly lack of inode)? 2008-09-09 20:25 the dentry maps filename -> inode 2008-09-09 20:25 in cache 2008-09-09 20:25 does filename include full path? 2008-09-09 20:25 only when tere is a miss in the dentry cache does the vfs go to the filesystem 2008-09-09 20:25 no, not the full path 2008-09-09 20:25 Important sizes: 2008-09-09 20:25 block 1024 2008-09-09 20:25 inode 300 2008-09-09 20:25 dentry 128 2008-09-09 20:25 bh 56 2008-09-09 20:25 kmem_cache 12 2008-09-09 20:25 the parent inode and the filename 2008-09-09 20:25 relative to the fs root? 2008-09-09 20:25 ah 2008-09-09 20:25 razvanm, nice 2008-09-09 20:26 flips might want to correct me ;-) 2008-09-09 20:26 so every time you open a file, you get a dentry+inode+file 2008-09-09 20:26 already a lot of cache memory 2008-09-09 20:26 for a tiny thing maybe 2008-09-09 20:26 with 6 btyes in it echo hello >foo 2008-09-09 20:26 oh thats why it gets pretty big 2008-09-09 20:27 it's only the beginning 2008-09-09 20:27 among other slabs 2008-09-09 20:27 you also get an "address_space" for the inode 2008-09-09 20:27 misnamed 2008-09-09 20:27 that is the radix tree 2008-09-09 20:27 so if the file is opened, the dentry + inode are locked in cache? 2008-09-09 20:27 yes 2008-09-09 20:27 and the whole chain of parents 2008-09-09 20:27 up to the superblock of the fs 2008-09-09 20:27 not locked 2008-09-09 20:28 they can be evicted 2008-09-09 20:28 usage count increased? 2008-09-09 20:28 only until the inode goes away 2008-09-09 20:28 sorry 2008-09-09 20:28 you've lost me then. 2008-09-09 20:28 the inode's use count is elevated 2008-09-09 20:28 until the dentry goes away 2008-09-09 20:28 it's about the nastiest part of the whole vfs 2008-09-09 20:28 and we're here already 2008-09-09 20:29 so dentries can come and go as they please? 2008-09-09 20:29 what happens is, dentries spend a lot of their life sitting around in cache with zero use count 2008-09-09 20:29 that's what happens if you open a file, do something, and close it 2008-09-09 20:29 $ cat /proc/slabinfo | egrep 'dentry|#' 2008-09-09 20:29 # name : tunables : slabdata 2008-09-09 20:29 dentry 253015 253576 132 29 1 : tunables 120 60 8 : slabdata 8744 8744 0 2008-09-09 20:29 only when the vm comes along and tries to shrink the caches to recover memory do the dentries and inodes go away 2008-09-09 20:29 note 132 2008-09-09 20:30 yes, something ahs been pushing stuff out 2008-09-09 20:30 it changes from time to time 2008-09-09 20:30 these days, linux pushes too much cache out at the wrong times 2008-09-09 20:30 you will notice that if you run on a slow machine 2008-09-09 20:30 see, this is the real vfs course ;-) 2008-09-09 20:31 ok, lifetime of objects in cache is one of the biggest touchy spots in linux 2008-09-09 20:31 it's often very hard to know what owns what 2008-09-09 20:31 there's no way to tell the vfs you're doing a bg filesystem scan and to not cache for eternity? 2008-09-09 20:31 and yet, you have to when you work on fs code 2008-09-09 20:32 there are various ways to tell it that 2008-09-09 20:32 good ways is another question 2008-09-09 20:32 we have the concept of hot and cold ends of lru list 2008-09-09 20:32 when something is gets accessed, it gets moved to the hot end 2008-09-09 20:32 and stuff is evicted from the cold end 2008-09-09 20:32 in theory 2008-09-09 20:32 in practice... well 2008-09-09 20:33 linux has been benchmarked as worse than random replacement policy 2008-09-09 20:33 somebody needs to go in and fix that 2008-09-09 20:33 so a queue basically 2008-09-09 20:33 the only way i'm aware of to inform the kernel about your intentions is posix_fadvise, and that doesnt let you do much 2008-09-09 20:33 a lru list, yes 2008-09-09 20:33 acts like a queue 2008-09-09 20:33 certainly nothing related to dentries 2008-09-09 20:33 old stuff is supposed to move down to the cold end and get evicted 2008-09-09 20:34 shapor, though you were on a plane 2008-09-09 20:34 just a sec 2008-09-09 20:34 not yet 2008-09-09 20:34 back 2008-09-09 20:34 is there one global dentry cache? per cpu? per socket? per fs? per inode? 2008-09-09 20:34 ok, we're not doing vm 2008-09-09 20:34 this is vfs ;-) 2008-09-09 20:34 is global, right? :P 2008-09-09 20:34 there is one global dentry cache 2008-09-09 20:35 it is indexed by fs*dir*name 2008-09-09 20:35 so it acts like one per fs 2008-09-09 20:35 so it maps superblock:inode:filename -> inode? 2008-09-09 20:35 the only way i know of purging it is umount(), right flips? 2008-09-09 20:35 yes 2008-09-09 20:35 in general,yes 2008-09-09 20:35 there are internal interfaces for purging 2008-09-09 20:36 a fs has access to that 2008-09-09 20:36 but almost nobody understands how to use that or cares to find out 2008-09-09 20:36 if you get it wrong, al will bark at you 2008-09-09 20:36 aren't we wasting a lot of memory by continuously keeping the 'fs' in there? most systems don't have that many mounted filesystems 2008-09-09 20:36 we waste huge buckets of memory 2008-09-09 20:37 yes, linux is a little special in this regard 2008-09-09 20:37 dentry cache is a linux only thing 2008-09-09 20:37 it gives a performance advantage in general 2008-09-09 20:37 but it uses massive gobs of memory 2008-09-09 20:37 it's tricky 2008-09-09 20:38 you can always print out the path a file was opened by 2008-09-09 20:38 other OSs doesn't cache the dentries? 2008-09-09 20:38 by following parent links in the dentry cache 2008-09-09 20:38 I don't think other os's have dentries? 2008-09-09 20:38 I'm not _that_ familiar with bsd etc 2008-09-09 20:38 but I think not 2008-09-09 20:38 earlier you'd said the dentries could be evicted? 2008-09-09 20:38 the above gets interesting when the namespace topology is changing while you follow links 2008-09-09 20:39 they can 2008-09-09 20:39 let's go find the dentry cache 2008-09-09 20:39 so how does that match up with being able to follow the parent links in the dentry cache? 2008-09-09 20:39 2008-09-09 20:40 hmm, no dentry.c 2008-09-09 20:40 http://lxr.linux.no/linux+v2.6.26.5/include/linux/dcache.h#L81 2008-09-09 20:40 there's a dcache.h 2008-09-09 20:41 ok, namei.c is the home of the dentry cache 2008-09-09 20:41 inconsistent naming 2008-09-09 20:42 struct dentry is defined in dcache.h though 2008-09-09 20:42 in general in linux, you want to be looking for "get" and "put" operations 2008-09-09 20:42 get means in object count, put means dec 2008-09-09 20:42 strange terminology, made in linux I think 2008-09-09 20:43 dache.c 2008-09-09 20:43 http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c 2008-09-09 20:43 so, dput 2008-09-09 20:43 http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c#L185 2008-09-09 20:44 big mess 2008-09-09 20:44 but you have to get familiar with it 2008-09-09 20:44 we also have iput, drop usage count of an inode 2008-09-09 20:45 is this too much down inthe nitty gritty? 2008-09-09 20:45 nope 2008-09-09 20:45 speaking for myself only of course - but the nitty gritty is always what I failed to grasp 2008-09-09 20:45 figuring out how an inode gets released is challenging 2008-09-09 20:46 look at iput, then iput_final 2008-09-09 20:46 http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L1149 2008-09-09 20:46 generally, if the fs does not want to take care of something, the vfs will do it for it 2008-09-09 20:46 this is the case in iput_final 2008-09-09 20:47 normally, inodes are dropped by generic_drop_inode 2008-09-09 20:47 there we see some classic unix 2008-09-09 20:48 the decision whether to delete an unlinked inode or not 2008-09-09 20:48 by this time, the dentry is long gone 2008-09-09 20:48 so is the directory entry, if i_nlink is zero 2008-09-09 20:48 we're nearly done for today 2008-09-09 20:49 I'm a little surpised by the fact than op can be null.. 2008-09-09 20:49 hour is coming up 2008-09-09 20:49 we still have 10 more minutes! :D 2008-09-09 20:49 the op can be null because the vfs does it in that case 2008-09-09 20:49 I'm going to answer questions for the next 10 minutes 2008-09-09 20:49 ACTION was about to ask about about deleting a directory 2008-09-09 20:50 so you than have a null op in the superblock operations and instead have it handled through such if statements all over the place? 2008-09-09 20:50 ok, let's go look a file_operations 2008-09-09 20:50 that seems like very non-OO 2008-09-09 20:50 maze, correy 2008-09-09 20:50 correct 2008-09-09 20:50 it's oo linux style 2008-09-09 20:50 very few linux hackers known any oo language 2008-09-09 20:50 is there a reason for that? that also seems worse performance wise... 2008-09-09 20:51 since we then have the if instead of just calling the method 2008-09-09 20:51 it doesn't cost much cpu 2008-09-09 20:51 it's sloppy 2008-09-09 20:51 and looks ugly 2008-09-09 20:51 and is inconsistent 2008-09-09 20:51 it's another branch that can be mispreditcted though 2008-09-09 20:51 every operation has its own custom way of doing things, usually 2008-09-09 20:51 if the branch matters, we tell the compiler not to mispredict 2008-09-09 20:52 ugh... 2008-09-09 20:52 see "likely/uinlikely" 2008-09-09 20:52 unlikely 2008-09-09 20:52 yeah, I know 2008-09-09 20:52 the inefficiencies here are somewhat covered up by the fact that there are slow disks underneath 2008-09-09 20:52 and then, it's not really inefficient 2008-09-09 20:53 the stuff that _can_ cost lots of cpu has been profiled and fixed long ago 2008-09-09 20:53 ok, so it's just disgusting and extra code complexity ;-) 2008-09-09 20:53 these days, it costs a lot more to contend a spinlock than mispredict a branch 2008-09-09 20:53 yes, it's fairly disgusting 2008-09-09 20:53 one never learns to love it ;-) 2008-09-09 20:53 respect it, yes 2008-09-09 20:54 it does a lot, has a huge amount of flexibility 2008-09-09 20:54 ok, there was a question 2008-09-09 20:54 let's go look at how ext2 deletes a directory 2008-09-09 20:54 is stuff like this not fixable? 2008-09-09 20:54 right, delete a directory. 2008-09-09 20:54 the right person could fix it 2008-09-09 20:54 you have to have memorized stevens 2008-09-09 20:55 and you have to like fighting in pig shit 2008-09-09 20:55 do you llike fighting in pig shit? 2008-09-09 20:55 because you have some of the other qualifications ;-) 2008-09-09 20:56 I have a tendency to fight uphill battles, yes. 2008-09-09 20:56 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/namei.c#L275 <- ext2_rmdir 2008-09-09 20:56 pretty easy to read 2008-09-09 20:56 and write for that matter 2008-09-09 20:56 I didn't say uphill ;-) 2008-09-09 20:56 it's not a hill 2008-09-09 20:57 it's a ditch at the bottom of the farm 2008-09-09 20:57 flips: ack 2008-09-09 20:57 ah, but sh*t flows downhill, and if you're at the bottom 2008-09-09 20:57 stevens - which book is that referring to? 2008-09-09 20:57 when I asked the question I did not remember that the OS will refuse to delete a non-empty dir :P 2008-09-09 20:57 ext2_rmdir is plugged into this thing: http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/namei.c#L376 2008-09-09 20:57 and *_operations structure 2008-09-09 20:58 passes for an instance of a class in linux 2008-09-09 20:58 Advanced Programming in the UNIX Environment, Addison-Wesley, 1992. 2008-09-09 20:59 ok, now that we have found what ext2_rmdir is plugged into, we can follow it back up into the vfs 2008-09-09 20:59 clock on ext2_dir_inode_operations 2008-09-09 21:00 sorry 2008-09-09 21:00 clock on inode_operations 2008-09-09 21:00 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L1250 2008-09-09 21:00 then usage 2008-09-09 21:01 lxr is spinning 2008-09-09 21:01 this is the slowest operations... 2008-09-09 21:01 yes, and 3 doing it at the same time is enough to bring it to its knees 2008-09-09 21:01 apparently 2008-09-09 21:02 actually, I would say is as slow as usual 2008-09-09 21:02 as you can see, this is a popular struct 2008-09-09 21:02 true 2008-09-09 21:02 your are looking for the instances that are _not_ in a specific filesystem 2008-09-09 21:03 fs/namei.c, line 2971 2008-09-09 21:03 for example 2008-09-09 21:03 whoops, not interesting 2008-09-09 21:03 bad_inode.c inode.c libfs.c 2008-09-09 21:03 razvanm probably had the right idea 2008-09-09 21:04 yes, inode.c is good 2008-09-09 21:04 http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L114 2008-09-09 21:05 uhh... we're 5 minutes over time :P 2008-09-09 21:05 ACTION says big thanks! 2008-09-09 21:05 ;-) 2008-09-09 21:05 yep 2008-09-09 21:05 so homework: 2008-09-09 21:05 find out were inode_operations->rmdir is called 2008-09-09 21:06 it isn't spelled that way 2008-09-09 21:06 this is what makes linux fun ;-) 2008-09-09 21:06 very little is spelled the way you would expect 2008-09-09 21:06 :D 2008-09-09 21:06 ok, did we have fun today? 2008-09-09 21:06 that was awesome, too short :) thanks to all... i love the format of this class :) 2008-09-09 21:06 ACTION did had fun :-) 2008-09-09 21:07 thanks natalie :-) 2008-09-09 21:07 thanx flips, was cool 2008-09-09 21:07 the most important item is how to navigate lxr 2008-09-09 21:07 welcome, ralucame 2008-09-09 21:07 ACTION is RalucaME's twin ;-) 2008-09-09 21:07 :) 2008-09-09 21:07 where it's called from outside of fs'es? or within? 2008-09-09 21:08 way too short - agreed. 2008-09-09 21:08 at this rate we'll need more than a few of these ;-) 2008-09-09 21:08 Thanks! 2008-09-09 21:08 aha 2008-09-09 21:08 from the vfs 2008-09-09 21:08 that is, outside the fs 2008-09-09 21:08 things eventually start to fit a pattern 2008-09-09 21:09 and you don't need me to suggest how to follow the twisty paths any more 2008-09-09 21:09 at first it looks like random gibberish 2008-09-09 21:09 then later, you learn it is actually random gibberish 2008-09-09 21:09 :-) 2008-09-09 21:09 but it is fast and flexible gibberish 2008-09-09 21:09 flips: i_op->rmdir ? 2008-09-09 21:10 sounds good 2008-09-09 21:10 ups ;-) 2008-09-09 21:10 aww 2008-09-09 21:10 I personally always spell my ops "ops" 2008-09-09 21:10 makes it much easier to navitage 2008-09-09 21:10 so you look for ops->rmdir and you always find it 2008-09-09 21:11 the code was "inode->i_op = &empty_iops;" in alloc_inode 2008-09-09 21:11 boring 2008-09-09 21:11 haven't found the real one nwo 2008-09-09 21:11 yet I mean 2008-09-09 21:12 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L2256 2008-09-09 21:13 that's it 2008-09-09 21:13 gold star 2008-09-09 21:14 :P 2008-09-09 21:16 oh, lol 2008-09-09 21:16 more homework? 2008-09-09 21:16 and I just sent it via pm 2008-09-09 21:16 figure out how a struct inode gets deleted ;-) 2008-09-09 21:17 ACTION will do pm next time 2008-09-09 21:17 wild guess: iput gets called? 2008-09-09 21:17 thursday again ok? 2008-09-09 21:17 at 8pm? 2008-09-09 21:17 maze, that's a good first order approximation 2008-09-09 21:17 yes 2008-09-09 21:17 sounds good 2008-09-09 21:18 see you then! 2008-09-09 21:19 cu 2008-09-09 21:41 -!- nataliep(~nataliep@72.14.224.1) has left #tux3 2008-09-09 22:09 -!- pranith(~ca4bcee2@66.90.73.223) has joined #tux3 2008-09-09 22:14 -!- pranith(~ca4bcee2@66.90.73.223) has joined #tux3 2008-09-09 22:15 hello 2008-09-09 22:15 anyone here? 2008-09-09 22:21 hi 2008-09-09 22:23 hi flips 2008-09-09 22:23 what's up? 2008-09-09 22:23 are u daniel phillips? 2008-09-09 22:23 yes 2008-09-09 22:24 ok, i mailed you yesterday about tux3 :) 2008-09-09 22:24 I remember 2008-09-09 22:24 welcome to #tux3 2008-09-09 22:24 thank you :) 2008-09-09 22:24 have you been reading the mailing list archives? 2008-09-09 22:24 hmm, not much actually 2008-09-09 22:25 that is a very good place to start 2008-09-09 22:25 im joining it now.. 2008-09-09 22:25 http://tux3.org/pipermail/tux3/ 2008-09-09 22:26 ok 2008-09-09 22:28 flips, any particular mail you want me to start with? 2008-09-09 22:28 any order 2008-09-09 22:28 just poke around until you find one that interests yhou 2008-09-09 22:29 and follow the thread 2008-09-09 22:29 see what people are doing 2008-09-09 22:29 the fuse stuff is very interesting 2008-09-09 22:29 ok 2008-09-09 22:35 excellent ramen 2008-09-09 22:35 that japanese store wasn't kidding when they said it was "a little spicy" 2008-09-09 22:36 hmm 2008-09-09 22:46 flips, do u think i need to have some background about file systems to start with? 2008-09-09 22:46 always helpful 2008-09-09 22:46 as i mentioned, i've just recently started going throught the design book... 2008-09-09 22:46 there is a lot written on it 2008-09-09 22:47 a linux specific book would be good too 2008-09-09 22:47 I never read the beos book 2008-09-09 22:47 or any book on filesystem design ;-) 2008-09-09 22:47 i couldn't find any linux specific os book :( 2008-09-09 22:47 filesystem* 2008-09-09 22:47 "understanding the linux kernel" 2008-09-09 22:47 probably best for this 2008-09-09 22:48 for filesystems? 2008-09-09 22:48 wikipedia is good too 2008-09-09 22:48 yes 2008-09-09 22:48 ok 2008-09-09 22:48 http://www.yolinux.com/TUTORIALS/LinuxClustersAndFileSystems.html 2008-09-09 22:48 any concepts i need to pay particular attention to? 2008-09-09 22:48 vfs 2008-09-09 22:48 locking 2008-09-09 22:48 struct bio 2008-09-09 22:49 struct page 2008-09-09 22:49 struct inode 2008-09-09 22:49 struct dentry 2008-09-09 22:49 struct buffer_head 2008-09-09 22:50 http://en.wikipedia.org/wiki/Filesystem 2008-09-09 22:50 http://en.wikipedia.org/wiki/Ext2 2008-09-09 22:50 http://en.wikipedia.org/wiki/Ext3 2008-09-09 22:51 http://en.wikipedia.org/wiki/Journaling_file_system 2008-09-09 22:51 http://en.wikipedia.org/wiki/Comparison_of_file_systems 2008-09-09 22:51 http://en.wikipedia.org/wiki/ACID 2008-09-09 22:51 <- very important 2008-09-09 22:51 hmm, ACID? first time i'm hearing about this 2008-09-09 22:52 it is the most important concept of all 2008-09-09 22:52 ohk, i knew abt these. just never heard the term ACID :) 2008-09-09 22:53 knew in the sense, i heard abt them. not "know" knowing :) 2008-09-09 22:53 need to memorize those concepts 2008-09-09 22:53 ok 2008-09-09 23:14 feh I just got bitten by the stupid ext2 convention than zero inode means deleted entry 2008-09-09 23:14 what's the matter with having an inode numbered zero again? 2008-09-09 23:14 zero inode? 2008-09-09 23:14 inum = 0 2008-09-09 23:15 hmm 2008-09-09 23:15 that's what ext2 uses to determine a dirent is deleted 2008-09-09 23:15 a little better than DOS, which sets the first character of the filename to 'e' 2008-09-09 23:15 but not much 2008-09-09 23:16 why not rely on name_len = zero instead? 2008-09-09 23:16 dumb 2008-09-09 23:16 I might change that for tux3 2008-09-09 23:16 u better do :) 2008-09-09 23:17 can't call it ext2_create_entry any more then ;-) 2008-09-09 23:17 so far it's exactly compatible 2008-09-09 23:17 but this is annoying 2008-09-09 23:18 I think I will at least create an is_deleted macro 2008-09-09 23:18 everywhere it relies on inum = 0 2008-09-09 23:18 might sound stupid, but i dont know.. whats the need for a deleted dirent? 2008-09-09 23:18 so you can recover the space to use for some other filename 2008-09-09 23:19 isn't that recovered during deletion time? 2008-09-09 23:19 that's the code I'm working on 2008-09-09 23:20 oh! :) 2008-09-09 23:20 and inum is what? the number of inodes? 2008-09-09 23:22 static inline int is_deleted(ext2_dirent *dirent) 2008-09-09 23:22 { 2008-09-09 23:22 return !dirent->inum; 2008-09-09 23:22 } 2008-09-09 23:23 hmm 2008-09-09 23:23 thats nice 2008-09-09 23:45 -!- cdk(~chinmay@121.246.36.139) has joined #tux3 2008-09-09 23:46 can now find/create xattr atoms in the atom table 2008-09-09 23:46 enough for today 2008-09-09 23:48 sleeping? 2008-09-09 23:50 soon 2008-09-09 23:57 which place? 2008-09-09 23:59 santa monica, CA 2008-09-09 23:59 and you? 2008-09-10 00:09 new delhi, india 2008-09-10 00:10 what do you do? 2008-09-10 00:13 hey 2008-09-10 00:14 hi bh 2008-09-10 00:23 johns hopkins 2008-09-10 00:23 u 2008-09-10 00:24 wrong channel 2008-09-10 00:35 hmm 2008-09-10 00:35 u work at johns hopkins? 2008-09-10 00:37 no, in santa monica 2008-09-10 00:37 linux kernel hacker 2008-09-10 00:37 time to sleep 2008-09-10 00:37 see you later 2008-09-10 00:50 :) 2008-09-10 00:50 goodnite 2008-09-10 01:52 -!- kbingham(~kbingham@92.0.11.166) has joined #tux3 2008-09-10 03:13 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-10 06:07 -!- stargazr5(~gauravstt@59.95.3.98) has joined #tux3 2008-09-10 07:02 -!- stargazr5(~gauravstt@59.95.35.187) has joined #tux3 2008-09-10 08:03 -!- RzM|Away(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-10 09:28 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-10 10:36 -!- Kirantpatil(~kiran@122.167.179.145) has joined #tux3 2008-09-10 10:37 -!- Kirantpatil(~kiran@122.167.179.145) has left #tux3 2008-09-10 11:19 -!- Bobby(~Bobby@122.160.64.177) has joined #tux3 2008-09-10 11:19 hello 2008-09-10 11:25 -!- Bobby(~Bobby@122.160.64.177) has joined #tux3 2008-09-10 11:51 -!- Bobby(~Bobby@122.160.64.177) has joined #tux3 2008-09-10 12:19 morning 2008-09-10 12:35 reading the lesson from last night 2008-09-10 12:35 test of fire 2008-09-10 12:49 int set_xattr(struct inode *inode, char *name, unsigned len, void *data, unsigned size) 2008-09-10 12:49 { 2008-09-10 12:49 atom_t atom = get_atom(inode->sb->atable, name, len); 2008-09-10 12:49 return xcache_update(inode, atom, data, len); 2008-09-10 12:49 } 2008-09-10 12:49 short and sweet? 2008-09-10 12:51 yes 2008-09-10 12:52 should that be xcache_update(inode, atom, data, size)? 2008-09-10 12:54 it should 2008-09-10 12:54 now is 2008-09-10 12:55 was originally going to be namelen and datalen 2008-09-10 12:55 it reads better as it is now 2008-09-10 12:55 now lets see if it works 2008-09-10 12:57 the test sets xattrs on the atom table directory inode 2008-09-10 12:58 it works 2008-09-10 12:58 set_xattr(inode, "foo", 3, "bar", 3); 2008-09-10 12:58 xcache_dump(inode); 2008-09-10 12:58 xattr 1: 0x8050fc8: 62 61 72 "bar" 2008-09-10 12:58 the test would have passed even with the bug above ;-) 2008-09-10 13:10 -!- kbingham(~kbingham@92.20.206.84) has joined #tux3 2008-09-10 13:14 next step is to get xattrs on/off disk 2008-09-10 13:28 notice in the above factoring we can easily have multiple atom tables 2008-09-10 13:28 I wonder what that is good for 2008-09-10 13:28 multiple xattr namespaces 2008-09-10 13:31 -!- kbingham(~kbingham@92.22.74.132) has joined #tux3 2008-09-10 13:35 "Stasticially speaking, the part of your disk that loses data is probably in the movies that suck, not the movies that are good, simply because the vast majority of movies suck. " -- http://alumnit.ca/~apenwarr/log/?m=200809#08 2008-09-10 13:35 wiser words were seldom spoken 2008-09-10 13:43 ACTION discusses concurrency issues with flips  2008-09-10 13:45 thinking about how'd I parallelize b-tree operations 2008-09-10 13:47 righto 2008-09-10 13:48 first obvious invariant: lock acquisition order is root-to-leaf 2008-09-10 13:49 obvious optimization rule: nodes near the root must not stay locked for long periods, including never locked across a read of disk data, other than the data of the ndoe itself 2008-09-10 13:51 nuther obvious one: no holding of locks above a node being read from disk 2008-09-10 13:51 this one is kinda tough sometimes, in cases like rename where the vfs holds locks 2008-09-10 13:59 here's a subtle and important one: subtrees below a locked node can be locked by another process, this is normal and important for throughput 2008-09-10 14:00 time for another hit of that excellent ramen 2008-09-10 14:28 hmm, I should probably stop put contributers email addys in the first line of the log 2008-09-10 14:28 where spambots can get them 2008-09-10 14:29 second line is most likely ok, it's the contributers call: do they want fame + spam? or just satisfaction? 2008-09-10 14:29 personally I try to err on the side of fame and have efficient spam filters 2008-09-10 14:31 my email is in tons of public places already, I've already got decent spam filters 2008-09-10 14:33 finished reading through last night's lesson 2008-09-10 14:33 and accompanying lxr pages 2008-09-10 14:34 and? 2008-09-10 14:35 ACTION can confirm that this ramien is excellent 2008-09-10 14:37 so my question (maybe it's stupid): what is an inode? 2008-09-10 14:37 my current guess is 'an in-memory representation of the stuff surrounding a file, except for the contents' 2008-09-10 14:38 it's two or three things that are often conflated 2008-09-10 14:39 1) (the real def) it is a structure that caches the details of a filesystem object or device object in the vfs 2008-09-10 14:39 2) it is the image of an above object in file store 2008-09-10 14:40 3) it is the numeric id of the backing store image of an inode 2008-09-10 14:40 actually, 0) it is an object that caches all the data and attributes of a filesystem object or system device 2008-09-10 14:41 the notion of object being separate from how it is cached or stored 2008-09-10 14:41 so you guess is close to the mark 2008-09-10 14:41 very close 2008-09-10 14:41 alright 2008-09-10 14:41 in tux3, sometimes caches the contents as well 2008-09-10 14:42 well 2008-09-10 14:42 always caches the contents 2008-09-10 14:42 because it points at a "mapping" 2008-09-10 14:42 which is a radix tree that points at the cached pages of the data of the inode 2008-09-10 14:43 also caches the xattr data, which in tux3 is nearly the same as the file data 2008-09-10 14:43 heh 2008-09-10 14:43 how does ext[234] and friends handle xattr data and quotas? 2008-09-10 14:44 weirdly 2008-09-10 14:44 in both cases 2008-09-10 14:44 read ext3/xattr.c 2008-09-10 14:44 going 2008-09-10 14:45 tries to pack xattrs for different inodes together in blocks, then notice when entire xattr blocks are the same and have multiple pointers to them from different inodes 2008-09-10 14:45 quotas are done through this awful vfs-level abstraction 2008-09-10 14:45 quota files 2008-09-10 14:45 a real mess 2008-09-10 14:45 wrong idea 2008-09-10 14:46 I am not sure whether there is any connection between xaddrs and quota in ext* 2008-09-10 14:47 actually, the vfs thoughtfully provides a bypass around the quota file mess so a filesystem that wants to do it right can do so 2008-09-10 14:47 don't know if anybody uses that bypass 2008-09-10 14:52 for ext3: All attributes must fit in the inode and one additional block. 2008-09-10 14:55 right. lame. 2008-09-10 14:56 heh. 2008-09-10 14:57 tux3 goes at it more like HFS file fork 2008-09-10 14:57 ext3 uses the macros le32_to_cpu (and equivalently for tux3, be??_to_cpu) 2008-09-10 14:57 strangely, macos doesn't do xattrs like file forks, it limits them like ext* 2008-09-10 14:57 I'm going to respell those I think 2008-09-10 14:57 they're clunky 2008-09-10 14:58 from_be_u32 and to_be_u32 <- less clunky 2008-09-10 14:59 or from_beu32 and to_beu32 2008-09-10 15:00 or from_u32b and to_u32b 2008-09-10 15:00 vs from_u32l and to_u32l 2008-09-10 15:00 or from_u32be and to_u32be 2008-09-10 15:00 vs from_u32le and to_u32le 2008-09-10 15:01 not quite decided 2008-09-10 15:01 but probably will do a big spam edit in the next couple of days to make it better than it is 2008-09-10 15:01 we have a lot of endian work ahead of us and the inlines should support it, not get i n the way 2008-09-10 15:03 from_beu32 and to_beu32 <- this is probably the form that is the easiest the edit and least likely to offend kernel hacks 2008-09-10 15:03 easist to edit I mean 2008-09-10 15:04 in kernel they will likely just be #defined to be the kernel faves 2008-09-10 15:04 it's pathetic that gcc doesn't just make this an attribute 2008-09-10 15:56 my vote: 2008-09-10 15:56 from_u32be and to_u32be 2008-09-10 15:57 ok 2008-09-10 15:57 seems easiest to read 2008-09-10 15:57 I think yours is the casting vote because you are the only one who voted 2008-09-10 15:57 ;-) 2008-09-10 15:59 I know there's already been tons of discussion about C++ in the kernel, but sometimes, some aspects of it (OO, private fields, accessors) would make the interfaces so much cleaner... 2008-09-10 15:59 it's too bad some sort of OO shim can't be included in C 2008-09-10 16:00 really 2008-09-10 16:00 too bad the C and C++ camps don't even speak to each other any more 2008-09-10 16:00 feel free to write a chunk of tux3 in c++ if you want 2008-09-10 16:00 for example, tux3.c 2008-09-10 16:01 c++ desperately needs designated initializers 2008-09-10 16:13 maze, there ya go 2008-09-10 16:14 hmm? what do you mean? 2008-09-10 16:14 I see that the tux3 announcement is more popular than the 2.6.24.4 announcement on lkml.org 2008-09-10 16:14 endian conversions respelled according to your taste 2008-09-10 16:14 2.6.24.7 was out already ;-) 2008-09-10 16:14 oh, right , cool! 2008-09-10 16:14 so you probably mean 2.6.26.4 which was a screw up and was soon 2.6.26.5 2008-09-10 16:15 heh 2008-09-10 16:15 you'd think that would make the announcement even more popular 2008-09-10 16:15 what was the screw up? 2008-09-10 16:15 it didn't compile ;-) 2008-09-10 16:16 wow 2008-09-10 16:16 really? 2008-09-10 16:16 how could that happen 2008-09-10 16:16 at least some relativelly important option 2008-09-10 16:16 we should donate gregkh a computer 2008-09-10 16:16 http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.26.5 2008-09-10 16:17 there's a build option exactly to catch stuff like that 2008-09-10 16:17 and... how about trying to build before posting? 2008-09-10 16:17 not that I always do it 2008-09-10 16:17 oh, at least he fixed it by releasing a new number 2008-09-10 16:18 the the consequences of messing that up are less far reaching with tux3 than stable linux 2008-09-10 16:18 I've got some best-left-unnamed teams that like to release bugfixed version of sw with the same version number 2008-09-10 16:18 right, and hundreds or thousands of people who innocently download the screwup will be confused and/or annoyed 2008-09-10 16:19 people don't download that quickly 2008-09-10 16:19 because one of the things gregkh also doesn't do, is rename screwups as -dontuse 2008-09-10 16:19 it was fixed in 6 hours 2008-09-10 16:19 yes they do 2008-09-10 16:19 most people pick it up through the distros 2008-09-10 16:19 check out the load spikes on lkml.org 2008-09-10 16:19 sorry 2008-09-10 16:19 kernel.org 2008-09-10 16:19 I'm still running 2.6.26.3 and waiting for a reboot to hit 2.6.26.5... 2008-09-10 16:20 with even my laptop getting average uptimes of 3-4 weeks, reboots ain't that often, so upgrades ain't that often 2008-09-10 16:20 behind ;-) 2008-09-10 16:20 I won't mention what I'm running ;-) 2008-09-10 16:20 my workstation is also the tux3.org server 2008-09-10 16:20 I really want 2.6.27.1 to come out though 2008-09-10 16:20 so I don't reboot much 2008-09-10 16:20 heh 2008-09-10 16:20 or try the latest flights of fancy of kernel devs to soon 2008-09-10 16:20 the wireless driver ath9k should fix a lot of my wireless woes 2008-09-10 16:21 me too 2008-09-10 16:21 <- notice the 2.6.27.1 ;-) when 2.6.27 isn't out yet 2008-09-10 16:21 I figure once 2.6.27.1 is out 2.6.27 shouldn't eat your data anymore 2008-09-10 16:21 looking forward to it with almost as much anticipation as an open ati driver that performs better than 50% as well as the bespoke one 2008-09-10 16:21 heh 2008-09-10 16:22 I did upgrade to 2.6.27rc3 and I'm not sure whether it was ath9k, or 2.6.27 or a bad build 2008-09-10 16:22 ok, have to do something about a haircut now 2008-09-10 16:22 or bad config options 2008-09-10 16:22 before jumping back in and finishing up adding xattr support to tux3.c 2008-09-10 16:22 but it promptly ran out of swiotlb buffers and corrupted my hard disk 2008-09-10 16:23 my ath woes are taken care of by having an eee 2008-09-10 16:23 spent a weekend recovering... thankfully had a one week old backup (earlier that week my laptop stopped booting...) 2008-09-10 16:23 they messed with setting up the binary/evil driver 2008-09-10 16:23 works great 2008-09-10 16:23 course I've got a bunch of other aths around taht would benefit 2008-09-10 16:23 my wife's machine for example 2008-09-10 16:24 I'm using madwifi drivers now on a macbook pro 3,1 - they work, but occasionally they disconnect, and you need to unload and reload the entire wireless stack 2008-09-10 16:24 which has a wire running into it because I don't have the energy to mess with the braindamanged firmware laod 2008-09-10 16:24 madwifi has been solid as a rock for me in the 4-5 years I've used it 2008-09-10 16:24 on a pci wireless 2008-09-10 16:25 I have a 'fix-wireless.sh' running in an xterm - it pings default gateway, if it's unreachable for 5 seconds, then prompty shuts down dhcpc/wpa_supplicant/wireless and brings it all back up - best part is you don't even lose existing established network conenctions (ssh, etc) 2008-09-10 16:26 it's still annoying though, because you occasionally get these 15 second pauses (happens maybe once an hour) 2008-09-10 16:26 :p 2008-09-10 16:27 I was running a very old incarnation of madwifi 2008-09-10 16:27 likely some value in that 2008-09-10 16:27 couldn't get the latest working after trying for an hour or so, so just plugged in the wire 2008-09-10 16:27 which is way faster anyway 2008-09-10 16:28 agreed, you want wired for everything stationary 2008-09-10 16:28 laptops aren't stationary though... 2008-09-10 16:29 right, which is why I love my eee 2008-09-10 16:29 don't have to worry about a thing, somebody else does 2008-09-10 16:30 not to mention the fact that it fits comfortably in the flap of my camera backpack 2008-09-10 16:32 it runs linux? 2008-09-10 16:32 yes 2008-09-10 16:32 beautifully 2008-09-10 16:32 by default? 2008-09-10 16:33 everybody loves it here 2008-09-10 16:33 yes 2008-09-10 16:33 hmm 2008-09-10 16:33 funny thing is, the linux and windows versions cost the same, but you get 20G of flash disk with linux and only 12G with XP 2008-09-10 16:33 901 is the one to get 2008-09-10 16:34 I have the 900 and I'll probably pick up a 901 pretty soon 2008-09-10 16:34 need more than one for this family 2008-09-10 16:40 tux3 has bloated up to ~6,500 lines of .c + .h, sparsely commented and densely written, including unit tests 2008-09-10 16:41 so I think the kernel port will come in around 10K lines, complete with versioning, commit transactions and shiny new directory index 2008-09-10 16:41 about 1-2 months away from knowing 2008-09-10 16:41 depends a lot on how tux3 university goes ;-) 2008-09-10 16:52 http://userweb.kernel.org/~warthog9/damaged_server/ <- wow 2008-09-10 16:52 I'll think twice about using fedex 2008-09-10 16:57 sk8 oclock 2008-09-10 17:03 ouch 2008-09-10 17:03 are you counting the somewhat redundant tux3fuse/tux3fs in those 6500? 2008-09-10 18:30 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-10 18:51 -!- nataliep_(~nataliep@66-102-14-1.google.com) has joined #tux3 2008-09-10 19:22 ACTION just realized that the next tux3 university is tomorrow and not today :P 2008-09-10 19:49 konrad, yes 2008-09-10 19:49 konrad, it was a joke 2008-09-10 19:49 6,500 lines at this point is really tight 2008-09-10 19:50 also includes the buffer and page cache emulation 2008-09-10 19:51 and none of tux3.c belongs in kernel, though it is probably the base on which tux3 mkfs and fsck will be built 2008-09-10 19:54 incidentally, that is about 1,000 lines/week 2008-09-10 19:54 a respectable pace 2008-09-10 19:55 especially considering the rewrite ratio is really high 2008-09-10 19:56 ACTION extracts the cork from a bottle of cabernet 2008-09-10 19:57 nice um, tasty california wine to go with the pasta 2008-09-10 19:57 unpretentious, unsophisticated, gives you its phone number right away 2008-09-10 19:58 top note of jelly beans 2008-09-10 20:55 next, inode.c need to know about loading and saving xattrs 2008-09-10 21:29 flips: very respectable 2008-09-10 21:29 especially if it all works 2008-09-10 21:30 the babernet worked fine 2008-09-10 21:30 cabernet 2008-09-10 21:30 <- proof 2008-09-10 21:30 ACTION has to arrange another cabal meeting 2008-09-10 21:31 bh, when you up here next? 2008-09-10 21:32 don't know 2008-09-10 21:32 I mean, really any time I want 2008-09-10 21:32 doesn't take long for san diego does it? 2008-09-10 21:32 right 2008-09-10 21:32 you'e not that far away 2008-09-10 21:32 I'll ping you when the next cabernet meeting comes up 2008-09-10 21:33 cabernet ? 2008-09-10 21:33 err 2008-09-10 21:33 cabal ;-) 2008-09-10 21:33 my tastes run more to bordeaux at cabal meetings 2008-09-10 21:33 or sake 2008-09-10 21:33 sorry, oh ok, well, that would be great. Did one happen today ? 2008-09-10 21:33 no, last was a week or so ago 2008-09-10 21:33 who came ? 2008-09-10 21:33 can't say, it's a cabal 2008-09-10 21:34 good peeps 2008-09-10 21:34 ok 2008-09-10 22:08 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-10 22:19 -!- stargazr5(~gauravstt@59.95.19.195) has joined #tux3 2008-09-10 22:54 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-11 00:30 inode table block 0x0/15 (f2c bytes free) 2008-09-11 00:30 0x0: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root 4:1 ctime 0 size 200 2008-09-11 00:30 0x2: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root 6:1 2008-09-11 00:30 0xa: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root a:1 2008-09-11 00:30 0xd: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0040755 uid 0 gid 0 root 8:1 2008-09-11 00:30 0xe: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0100700 uid 0 gid 0 root d:1 ctime 0 size 1008 xattr(s) 2008-09-11 00:30 {1} => 0x805f110: 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 "hello world!" 2008-09-11 00:30 inode 0xe (14) has an extended attribute with atom number 1 and body "hello world!" 2008-09-11 00:31 so an xattr made it into the inode table 2008-09-11 00:31 and onto disk I think 2008-09-11 00:31 need to verify that by trying to get it back 2008-09-11 01:29 nice 2008-09-11 02:16 one step closer to ruling the world 2008-09-11 02:18 flips: what you were trying to do was add reference counting to attributes ? 2008-09-11 02:18 but aborted it ? 2008-09-11 02:18 aborted? 2008-09-11 02:18 no reference counting just now 2008-09-11 02:18 just trying to get xattrs onto disk and back off 2008-09-11 02:18 very close to that now 2008-09-11 02:19 right there was some discussion about it and you decided to go with a simpler approach 2008-09-11 02:19 I decided to go with reference counting 2008-09-11 02:19 yeah, looks like it 2008-09-11 02:19 oh really ? 2008-09-11 02:19 but not just yet 2008-09-11 02:19 have you thought about having extensions for easy of use with samba ? 2008-09-11 02:19 they just want xattrs that work well 2008-09-11 02:20 ok 2008-09-11 02:20 tridge was disappointed with the performance of pretty well every filesystem wrt xattrs 2008-09-11 02:21 it's a hard problem to solve 2008-09-11 02:21 more folks just ignore it 2008-09-11 02:21 more=most 2008-09-11 02:21 generally done badly from what I've seen 2008-09-11 02:21 I like the way it's coming out in tux3 2008-09-11 02:21 store_attrs: Failed assertion "attr == base + size"! 2008-09-11 02:21 Trace/breakpoint trap 2008-09-11 02:21 got to debug 2008-09-11 02:22 ok 2008-09-11 02:23 ah I see the problem 2008-09-11 02:24 the attribute size estimation done for xattrs before saving inode should not include the xattr header size, only the variable data part 2008-09-11 02:32 there, got my xattr back 2008-09-11 02:32 lets see if I can set a new one 2008-09-11 02:32 yay 2008-09-11 02:32 nope, that's the problem, the second set fails 2008-09-11 02:32 but it's progress 2008-09-11 02:33 very definite progress 2008-09-11 02:46 it works now 2008-09-11 02:46 konrad, you can say yay for real ;-) 2008-09-11 02:47 yay for real 2008-09-11 02:47 :-) 2008-09-11 02:47 bug was in a part of the code you worked on 2008-09-11 02:47 for (int kind = MIN_ATTR; kind < VAR_ATTRS; kind++) { 2008-09-11 02:47 but xattrs didn't exist then 2008-09-11 02:48 the attribute encode now has two parts 2008-09-11 02:48 the part that encodes 'standard' attributes 2008-09-11 02:48 and the part that enocdes extended attribute from the xcache 2008-09-11 02:48 ah 2008-09-11 02:49 the standard attribute encoder better not write out headers for extended attributes, which it was doing 2008-09-11 02:49 this part of the code is going to evolve a lot as things progress 2008-09-11 02:50 it gets more complex when versioning arrives 2008-09-11 02:50 then we can't just blindly overwrite the entire set of attributes in the verison table 2008-09-11 02:50 because the inode only has the attributes for one version 2008-09-11 02:50 attributes for other versions have to be left alone 2008-09-11 02:50 messy 2008-09-11 02:51 but also some weeks away 2008-09-11 02:51 this code will do for the nonversioning protoytpe 2008-09-11 02:55 committed 2008-09-11 02:55 enough for today 2008-09-11 02:58 I think I need to reward myself with a pair of these: http://www.skatehut.co.uk/acatalog/Seba_FR1_Skates_-_Orange_White___195.html 2008-09-11 02:59 £200 is a lot in USD 2008-09-11 03:01 can get them for $350 here 2008-09-11 03:01 I think 2008-09-11 03:01 not easy to get 2008-09-11 03:01 americans have dodgy taste in skates ;-) 2008-09-11 03:02 everybody is either fitness or agressive 2008-09-11 03:02 aggressive skates are just stupid 2008-09-11 03:02 made for only one thing: sliding down rails 2008-09-11 03:02 yeah 2008-09-11 03:02 tiny little wheels 2008-09-11 03:02 don't need wheels for that 2008-09-11 03:02 heh 2008-09-11 03:03 just wear a pair of shoes with no traction :) 2008-09-11 03:03 "extreme walking" 2008-09-11 03:03 right 2008-09-11 03:03 I saw a couple of aggro skaters for the first time on the strand 2008-09-11 03:03 jumping up on things, seemed like fun 2008-09-11 03:03 heh 2008-09-11 03:04 but I can do that on my street skates too 2008-09-11 03:04 really? 2008-09-11 03:04 kind of tough to slide down rails 2008-09-11 03:04 or impossible 2008-09-11 03:04 you have enough space between the middle wheels? 2008-09-11 03:04 yeah 2008-09-11 03:04 no I don't grind 2008-09-11 03:04 there's a grind plate, I can get up on some things with it 2008-09-11 03:04 yeah 2008-09-11 03:04 ah 2008-09-11 03:05 sounds like you have some experience 2008-09-11 03:05 no 2008-09-11 03:05 I just put on skates for the first time in 6-7 years a few days ago 2008-09-11 03:05 they're kind of a size or size and a half too small 2008-09-11 03:05 ouch 2008-09-11 03:05 yeah 2008-09-11 03:07 started skating down the little stub walls the skateboarders grind on 2008-09-11 03:07 that seems to impress the skateboarders 2008-09-11 03:07 it's easier to do it on one foot 2008-09-11 03:07 probably looks harder though 2008-09-11 03:17 heh 2008-09-11 03:52 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 04:14 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-11 04:51 -!- kmeyer(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-11 05:00 flips: tux3fuse has xattrs now (not my doing) 2008-09-11 05:30 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 05:30 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-11 05:30 -!- nataliep_(~nataliep@66-102-14-1.google.com) has joined #tux3 2008-09-11 05:30 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-11 05:30 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-11 05:30 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-11 05:58 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-11 05:58 -!- RzM|Away(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 05:58 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-11 05:58 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-11 05:58 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-09-11 07:33 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 07:48 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-11 09:02 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-11 12:09 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-11 13:45 -!- kbingham(~kbingham@92.20.246.248) has joined #tux3 2008-09-11 14:04 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:06 konrad, that was amazingly fast of tero hmm? Nice code too. 2008-09-11 14:06 konrad, but you made this all happen, and your code was very decent as well 2008-09-11 14:07 tero is in the real pro category, lots to learn from him 2008-09-11 14:12 -!- cdk(~chinmay@121.246.36.119) has left #tux3 2008-09-11 14:15 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:19 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:31 My new boombox arrive 2008-09-11 14:31 now I can go totally ghetto, skating down to the beach with a ghetto blaster in my hand 2008-09-11 14:32 ACTION is degenerating under the influence of certain skaters 2008-09-11 15:04 haircut time 2008-09-11 15:04 later... 2008-09-11 15:37 http://linux.slashdot.org/article.pl?sid=08/09/11/1913229 <- first time I ever say the "pigfuckers" tag on slashdot 2008-09-11 15:37 re lenova caving to msft on shipping linux preinstalls 2008-09-11 16:42 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 16:43 howdy 2008-09-11 18:06 sk8 oclock 2008-09-11 18:06 on this skate, I will think about implementation details of atom refcounts 2008-09-11 19:28 that was fun 2008-09-11 19:28 I did a move that got the sk8rs clapping 2008-09-11 19:28 then they said "ok rollerbladers are allowed" 2008-09-11 19:28 in the sk8 park that is 2008-09-11 19:29 got to grab a quick bite, then hopefully we can do chapter two of tux3 university 2008-09-11 19:29 anybody here for that? 2008-09-11 19:30 ACTION nods 2008-09-11 19:47 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 19:55 hiyah 2008-09-11 19:55 just warming up for the next episode 2008-09-11 19:55 with a some pasta and a glass of cabernet 2008-09-11 19:55 ran out of chianti ;-) 2008-09-11 19:56 :D 2008-09-11 19:56 btw, what is brand is that ramen you mention the other day? 2008-09-11 19:57 important point 2008-09-11 19:57 shin ramyun 2008-09-11 19:57 made by nog shim 2008-09-11 19:58 sorry 2008-09-11 19:58 nong shim 2008-09-11 19:58 "family pack" 2008-09-11 19:58 "gourmet spicy" 2008-09-11 19:58 overdid it a little yesterday, had three packs ;-) 2008-09-11 19:58 don't do that 2008-09-11 19:58 -!- RalucaME(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 19:58 http://kimchimamas.typepad.com/.shared/image.html?/photos/uncategorized/2007/12/13/nong_shim.jpg ? 2008-09-11 19:59 exactly 2008-09-11 20:00 I like it hot :-) 2008-09-11 20:00 seldom had better ramyun, even in korea 2008-09-11 20:00 it's actually korean of course 2008-09-11 20:00 got it from a japanese grocery 2008-09-11 20:01 -!- ebiederm(~eric@c-24-130-11-59.hsd1.ca.comcast.net) has joined #tux3 2008-09-11 20:01 hi eric 2008-09-11 20:01 so I'll have no chance to find it at Giant or Superfresh :( 2008-09-11 20:01 let me introduce you to one of the foremost kernel hackers in the known universe 2008-09-11 20:01 eric biederman 2008-09-11 20:01 say hi :-) 2008-09-11 20:01 hello all. 2008-09-11 20:02 eric is responsible for much of what makes linux great in the supercomputing cluster space 2008-09-11 20:02 ACTION also says Hello! 2008-09-11 20:02 konrad, don't be shy ;-) 2008-09-11 20:02 hi Eric 2008-09-11 20:02 hello :) 2008-09-11 20:02 well eric is not really a vfs guy, just a general genius 2008-09-11 20:03 knows everything about everything nearly 2008-09-11 20:03 :-) 2008-09-11 20:03 lol 2008-09-11 20:03 ACTION double checks that the logging is enabled 2008-09-11 20:03 hey 2008-09-11 20:03 also, nataliep_ up there is the linux kernel bug manager 2008-09-11 20:03 more folks have joined, nice 2008-09-11 20:03 ok, let's start 2008-09-11 20:04 first let me ask some questions: what does VFS stand for? 2008-09-11 20:04 virtual file system 2008-09-11 20:04 close but no 2008-09-11 20:04 subsystem? :D 2008-09-11 20:04 ACTION listens to the sound of googling 2008-09-11 20:04 hey 2008-09-11 20:04 maze! 2008-09-11 20:05 yeah, so 8pm is a little tight ;-) 2008-09-11 20:05 maze is about the smartest smart person I met a google 2008-09-11 20:05 hehe, thanks! 2008-09-11 20:05 no exaggeration 2008-09-11 20:05 ok, let's try again: what does VFS stand for? 2008-09-11 20:05 googlling is ok 2008-09-11 20:05 ACTION is diluting the quality of the channel :P 2008-09-11 20:06 I doubt that, razvanm 2008-09-11 20:06 virtual file system 2008-09-11 20:06 er wait 2008-09-11 20:06 switch? 2008-09-11 20:06 that's been said hasn't it 2008-09-11 20:06 right! 2008-09-11 20:06 see? 2008-09-11 20:06 razvanm wins 2008-09-11 20:06 it stands for virtual filesystem switch 2008-09-11 20:06 versioning file system :P 2008-09-11 20:06 firefox had 'AVFS' at the top of my url bar for vfs :( 2008-09-11 20:06 how it got that name, I don't know 2008-09-11 20:06 it was the first hit for 'vfs lnux' :P 2008-09-11 20:06 eric probably does 2008-09-11 20:07 lol 2008-09-11 20:07 it switches between the different filesystems like a network switch switches between computers 2008-09-11 20:07 somebody better find out, because it's sure to come up at a geek challenge context at linuxtag eventually 2008-09-11 20:07 yes 2008-09-11 20:07 it is a colletion of methods that together implement a filesystem 2008-09-11 20:07 find out what? 2008-09-11 20:08 how it came to be called that 2008-09-11 20:08 I know where it came from but not why they picked the name. When the implemented the second filesystem on BSD they needed an abstraction layer. 2008-09-11 20:08 the vfs.txt from Documentation says: Overview of the Linux Virtual File System 2008-09-11 20:08 who came up with it 2008-09-11 20:08 etc 2008-09-11 20:08 ah 2008-09-11 20:08 trivia ;-) 2008-09-11 20:08 I knew eric would win that somehow ;-) 2008-09-11 20:08 well let me tell you 2008-09-11 20:08 the foremost filesystem dev on bsd does not know what vfs means 2008-09-11 20:08 :D 2008-09-11 20:08 or who called it taht, or why 2008-09-11 20:08 yet he is definitely the foremost fs dev 2008-09-11 20:09 everybody know his name? 2008-09-11 20:09 quick... 2008-09-11 20:09 hint: 2008-09-11 20:09 I suck at trivia... I'm lucky to know my own name... 2008-09-11 20:09 McKusick? 2008-09-11 20:09 he engaged in a discussion re tux3 design recently 2008-09-11 20:09 mckusick is close but no 2008-09-11 20:10 hint: firefly 2008-09-11 20:10 Dillon? 2008-09-11 20:10 the dragonfly hammer guy? 2008-09-11 20:10 yes! 2008-09-11 20:10 Matt Dillon IIRC 2008-09-11 20:10 hammer? 2008-09-11 20:10 also responsible for linux having a reverse mapped vm 2008-09-11 20:10 used to be the bsd vm guy 2008-09-11 20:10 is now the vm fs guy 2008-09-11 20:10 and runs his own distro 2008-09-11 20:10 intensely clueful person 2008-09-11 20:10 ok 2008-09-11 20:10 let's do some vfs 2008-09-11 20:11 ACTION is ready 2008-09-11 20:11 and let's start from the opposite end that we started from yesterday 2008-09-11 20:11 everybody got their browsers ready? 2008-09-11 20:11 yesterday? 2008-09-11 20:11 eh 2008-09-11 20:11 day before yesterday 2008-09-11 20:11 last time ;-) 2008-09-11 20:11 :D 2008-09-11 20:11 loaded ;-) 2008-09-11 20:12 lxr.linux.no should be my homepage or something 2008-09-11 20:12 lets go here: http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c 2008-09-11 20:12 super.c is the "main" for a linux filesystem 2008-09-11 20:12 we might call it tux3.c for tux3, or we might go with tradition and call it super.c 2008-09-11 20:13 it's got module_{init,exit} 2008-09-11 20:13 it has two basic tasks: 1) parse the mount options 2) load the fs superblock 2008-09-11 20:13 right 2008-09-11 20:13 it takes care of a few other details besides 2008-09-11 20:13 so let's take a look at some really crappy parsing code 2008-09-11 20:14 parse_options 2008-09-11 20:14 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L428 2008-09-11 20:14 line 429 2008-09-11 20:14 oops :) 2008-09-11 20:14 depends on the version of course 2008-09-11 20:15 429 on mine as well 2008-09-11 20:15 nothing really interesting here 2008-09-11 20:15 just good to know where it is 2008-09-11 20:15 so, there isn't actually such a thing as a linux "mount" program 2008-09-11 20:15 so it gets a string and a pointer to the superblock? 2008-09-11 20:15 all we do is call the fs's mount entry point 2008-09-11 20:15 sbi 2008-09-11 20:16 not quite the same 2008-09-11 20:16 sbi is the filesystem-specific bit of a superblock 2008-09-11 20:16 so that's the in-mem representation of an ext2 superblock 2008-09-11 20:16 superblocks and inodes in linux are both generic structures 2008-09-11 20:16 almost 2008-09-11 20:16 re in-mem rep 2008-09-11 20:17 there is also an exact image of the disk superblock that ext2 keeps around 2008-09-11 20:17 I don't know if tux3 will bother 2008-09-11 20:17 we shall see, that is a fiddly detain 2008-09-11 20:17 the sbi corresponds to what is called struct sb in the tux3 userspace 2008-09-11 20:18 and tux3 doesn't really have a generic superblock implemented at the moment 2008-09-11 20:18 linux kernel does 2008-09-11 20:18 superblock fields are separated into two classes: 1) ones that core vfs knows what to do with 2) ones that only mean something to the fielsystem 2008-09-11 20:18 inodes are separated the same way 2008-09-11 20:19 by a completely different mechanism, for not good reason 2008-09-11 20:19 any idea what the 0pt_ in the tokens means? 2008-09-11 20:19 the superblock specialization is via a fs-specific pointer 2008-09-11 20:19 oh, its opt not 0pt ;-) 2008-09-11 20:19 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L395 2008-09-11 20:20 not really 2008-09-11 20:20 5 minutes of poking will answer that 2008-09-11 20:20 or 1 minute 2008-09-11 20:20 there is some fairly trivial macro magic going on here and there 2008-09-11 20:20 [I mis-parsed as zero-pt font size...] 2008-09-11 20:20 anyway 2008-09-11 20:21 like I said, awful parsing code 2008-09-11 20:21 used to be a lot worse 2008-09-11 20:21 gets the job done in way too many lines 2008-09-11 20:21 well lets look at a more interesting bit 2008-09-11 20:21 loading the superblock 2008-09-11 20:21 quite tricky 2008-09-11 20:21 because the filesystem isn't working yet 2008-09-11 20:21 we don't even know the blocksize 2008-09-11 20:22 we have ext2_get_sb 2008-09-11 20:22 which is stored in the ext2_fs_type structure 2008-09-11 20:23 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L366 2008-09-11 20:23 of type "file_system_type" 2008-09-11 20:23 this is the starting point for any filesystem 2008-09-11 20:23 the tip of the iceberg 2008-09-11 20:23 root of the tree 2008-09-11 20:23 heart of the dragon etc 2008-09-11 20:24 file_system_type defines a few methods, by far the most important of which is get_sb 2008-09-11 20:24 this structure is passed to register_filesystem 2008-09-11 20:24 when the module is initialized 2008-09-11 20:24 which happens these days whether or not is actually a module by the way 2008-09-11 20:25 and that makes the filesystem appear in /proc/filesystems 2008-09-11 20:25 so everybody should do cat /proc/filesystems now 2008-09-11 20:25 and tell what they see there that is really interesting 2008-09-11 20:26 lots of nodev's 2008-09-11 20:26 lots of internal no-blockdev fs'es and 4 dev-fs'es 2008-09-11 20:26 suggesting that nodiv is a stupid idea... 2008-09-11 20:26 which is true 2008-09-11 20:26 and? 2008-09-11 20:26 my oly non-nodev are ext3 and vfat :P 2008-09-11 20:26 well, there's ext3, hfsplus, iso9660, fuseblk 2008-09-11 20:26 right 2008-09-11 20:26 and there is no tux3 2008-09-11 20:26 and a ton of internal ones (usb, ramfs, etc...) 2008-09-11 20:26 :D 2008-09-11 20:26 that is the most important thing to notice 2008-09-11 20:27 and that is why there is a tux3 university 2008-09-11 20:27 notice also that there is a ramfs 2008-09-11 20:27 ramfs is the second most useful filesystem for learning about the vfs 2008-09-11 20:27 the most useful being ext2 2008-09-11 20:27 also sockfs 2008-09-11 20:27 is sockfs for unix domain sockets? 2008-09-11 20:27 suckfs 2008-09-11 20:27 right 2008-09-11 20:28 :-) 2008-09-11 20:28 I'd prefer a shoefs 2008-09-11 20:28 don't take anything from the net side of linux as an example of anything besides "fast" 2008-09-11 20:28 sk8fs 2008-09-11 20:28 yup 2008-09-11 20:28 I see "fuse" 2008-09-11 20:28 interesting 2008-09-11 20:28 in fact, 3 or them 2008-09-11 20:28 fuse ,fuseblk, and fusectl 2008-09-11 20:28 3 of them 2008-09-11 20:29 that's a little over the top 2008-09-11 20:29 oh, this one is always a laugh: hugetlbfs 2008-09-11 20:29 a naive person would think one would be enough 2008-09-11 20:29 or would already be one too many 2008-09-11 20:29 hugetlbfs is indded the worst fs ever conceived 2008-09-11 20:29 what's the difference between rootfs/ramfs/tmpfs ? 2008-09-11 20:29 sometimes even the great penguin has bad days 2008-09-11 20:29 rootfs exists just to get linux booted 2008-09-11 20:30 probably a bad idea 2008-09-11 20:30 but that's how it works 2008-09-11 20:30 ramfs is really interesting 2008-09-11 20:30 it is basically just the vfs cache layer of a fs with all backing store stripped away 2008-09-11 20:30 it is worth reading every line 2008-09-11 20:30 is the split merely to be able to shave off more code in embedded? 2008-09-11 20:31 it is split for tutorial reasons 2008-09-11 20:31 ;-) 2008-09-11 20:31 ramfs is to serve as an example of a minimal fs with no backing store 2008-09-11 20:31 somehow it bloated up to 589 lines though 2008-09-11 20:31 when it really only needs 150 maybe 2008-09-11 20:32 so I guess somebody didn't get the memo ;-) 2008-09-11 20:32 tmpfs is the real workhorse 2008-09-11 20:32 that is basically ramfs backed by the swap device 2008-09-11 20:32 $ wc -l file-mmu.c 2008-09-11 20:32 53 file-mmu.c 2008-09-11 20:32 common mounted on /tmp these days 2008-09-11 20:32 commonly 2008-09-11 20:32 ok, I'll take a short break 2008-09-11 20:33 to refill my cabernet 2008-09-11 20:33 so tmpfs can be swapped out, while ramfs and rootfs can't 2008-09-11 20:33 linus pronounces 'vfs' as 'virtual filesystems' in ramfs/inode.c 2008-09-11 20:33 and why don't you compare notes? 2008-09-11 20:33 linus doesn't always get it right ;-) 2008-09-11 20:33 tytso would normally clobber him in a geek trivial contest 2008-09-11 20:34 http://farm1.static.flickr.com/164/413387043_ab2c7569a4.jpg :P 2008-09-11 20:34 :-) 2008-09-11 20:35 the reflection isn't quite as nice here 2008-09-11 20:35 but it does reflect, in this idea desk 2008-09-11 20:35 ikea 2008-09-11 20:35 ACTION also sits at an ikea desk ;-) 2008-09-11 20:36 ok, let's go up to ext2_fill_super 2008-09-11 20:36 we pass that as a method to a vfs library call 2008-09-11 20:36 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L737 2008-09-11 20:36 if you think that is an odd way to init a fs you'd be right ;-) 2008-09-11 20:36 so what is an sbi? 2008-09-11 20:37 ACTION waits 2008-09-11 20:37 sb info 2008-09-11 20:37 ext2_sb_info ptr 2008-09-11 20:37 right, and what points at it? 2008-09-11 20:37 sb->s_fs_info 2008-09-11 20:38 right 2008-09-11 20:38 so that is how the linux fs specializes a superblock 2008-09-11 20:38 by haing s_fs_info point at something allocated and initialized by the fs 2008-09-11 20:38 that only the fs will ever use 2008-09-11 20:38 how does it know how big to make it? 2008-09-11 20:39 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L768 <- here we read the superblock 2008-09-11 20:39 MaZe: sizeof(*sbi) 2008-09-11 20:39 maze, the fs declares it, and it makes it sizeof(that) 2008-09-11 20:39 won't that be fs dependant though? 2008-09-11 20:39 it is 2008-09-11 20:39 that is why it is a fs-specific pointer field 2008-09-11 20:39 pointer is always the same size :) 2008-09-11 20:40 core vfs will never look there 2008-09-11 20:40 right 2008-09-11 20:40 oh, right it's allocated within ext2 code 2008-09-11 20:40 thank goodness for that small mercy 2008-09-11 20:40 right 2008-09-11 20:40 here http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L755 2008-09-11 20:40 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L144 2008-09-11 20:40 one can easily imagine a universe in which pointers on the same machine are not all the same size 2008-09-11 20:41 keep'em beasties far away from me... 2008-09-11 20:41 so there is a some braindamage about trying to use the "blocksize as the device" to load the superblock 2008-09-11 20:41 bad idea 2008-09-11 20:41 should just assume that it is always the same size 2008-09-11 20:41 there is no legitimate concept of blocksize on a device, actually 2008-09-11 20:42 never mind that I have coded one in my vfs emulation ;-) 2008-09-11 20:42 that is a wart I will get rid of probably one day when it irritates me enough 2008-09-11 20:42 only the fs sbi should know the blocksize of the filesystem 2008-09-11 20:43 so, that nonsense about device blocksize is so that ext2 can use "sb_bread" to read the superblock 2008-09-11 20:43 again, there is no reason for this 2008-09-11 20:43 the tux3 userspace code directlly case "diskIo" there 2008-09-11 20:43 bypassing the buffer emulation 2008-09-11 20:43 and ext2 really should do the same, not have that fragile blocksize code there 2008-09-11 20:44 not get_sb_bdev ? 2008-09-11 20:44 right 2008-09-11 20:44 equivalent of tux3 diskio 2008-09-11 20:44 well 2008-09-11 20:44 these fns have a lot of cruft attached 2008-09-11 20:44 been through many iterations of doing things the wrong way 2008-09-11 20:45 so you want to go to the lowest level thing that will actually read if you want to be clear and robust here 2008-09-11 20:45 I'd be tempted to submit a bio 2008-09-11 20:45 but anyway 2008-09-11 20:45 we'll get there soon enough, and have to implement our own version of that 2008-09-11 20:45 let's do it a little more cleanly, but we don't have to save the world 2008-09-11 20:45 just now 2008-09-11 20:46 873 /* If the blocksize doesn't match, re-read the thing.. */ <- excellent example of yunk 2008-09-11 20:46 huck 2008-09-11 20:46 yuck 2008-09-11 20:46 :-) 2008-09-11 20:46 "yunk" is short for "yucky junk" 2008-09-11 20:46 and "huck" is what we will do with that in tux3 2008-09-11 20:47 so by here ext2 has managed to read its superblock: http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L898 2008-09-11 20:47 should actually have only been 3 lines, though we did do some options processing as well 2008-09-11 20:48 most of that is historical cruft 2008-09-11 20:48 keep in mind that ext2 is one of the cleanest filesystems ;-) 2008-09-11 20:48 :D 2008-09-11 20:48 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L915 <- ext2 dutifully reads the frag size, even though this bsd ufs concept was never implemented and never will be 2008-09-11 20:49 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L941 <- it checks the super magic 2008-09-11 20:49 tux3 gets to this point about 20 lines in or so 2008-09-11 20:50 a few more than that actually 2008-09-11 20:50 tux3.c 2008-09-11 20:50 but in the kernel implementation, it will be about a dozen lines from the fill_super entry 2008-09-11 20:50 as it should be 2008-09-11 20:51 next big job is to read the root directory! 2008-09-11 20:51 this is exciting because the filesystem isn't working yet 2008-09-11 20:51 wouldn't it be enough to just get the rootdir's inode number? 2008-09-11 20:51 we need to get the root dir up and running as an inode 2008-09-11 20:51 so that that open (2) and readdir work on it 2008-09-11 20:52 so yes 2008-09-11 20:52 we need to know the rootdirs inode number 2008-09-11 20:52 that has evolved over time with ext2 2008-09-11 20:52 it used to just be a fixed number 2008-09-11 20:52 now there is a fancier method 2008-09-11 20:52 for no good reason 2008-09-11 20:53 Tux3 uses inode number 0xd (for "directory" or "daniel") for the root dir 2008-09-11 20:53 http://lxr.linux.no/linux+v2.6.26.5/include/linux/ext2_fs.h#L61 2008-09-11 20:53 right 2008-09-11 20:53 somewhere there is "good ol'" something 2008-09-11 20:54 first non-reserved is 11 2008-09-11 20:54 I might have conflated that with something else 2008-09-11 20:54 #define EXT2_GOOD_OLD_FIRST_INO 11 2008-09-11 20:54 well it doesn't matter except for geek quizes 2008-09-11 20:54 yah 2008-09-11 20:54 that's it 2008-09-11 20:54 ok, we have 6 minutes for questions 2008-09-11 20:54 going to stop right here, just before doing anything interesting ;-) 2008-09-11 20:55 exactly! :O 2008-09-11 20:55 ouch 2008-09-11 20:55 when's the next meeting? next tuesday at 8pm? 2008-09-11 20:55 this lesson was definitely shorter... 2008-09-11 20:55 well it was fun looking at all that busy looking code that doesn't actually do much, no? 2008-09-11 20:55 it seemed too little time 2008-09-11 20:55 how about tomorrow? :D 2008-09-11 20:55 next tuesday, yes 2008-09-11 20:55 yeah, tomorrow works 2008-09-11 20:55 tuesday, then? 2008-09-11 20:56 homework is: know how the root dir is loaded and initialized, and now that differs from how any other inode is opened 2008-09-11 20:56 and how 2008-09-11 20:56 I meant 2008-09-11 20:56 tomorrow is friday ;-) 2008-09-11 20:57 so what's the 'desired' way to read data off disk in a fs? submit bio-s? would that also be the best way to read the superblock (you seem to have suggested that) 2008-09-11 20:57 friday is my most productive day :P 2008-09-11 20:57 not only do I have to relax then, I have to get atom refcounting working 2008-09-11 20:57 maze, I like submit_bio, yes 2008-09-11 20:57 then you have to wait on some lock 2008-09-11 20:57 is that the lowest level interface to the block device layer? 2008-09-11 20:57 two or three lines 2008-09-11 20:57 it is 2008-09-11 20:57 the lowest one you can use without getting shouted at 2008-09-11 20:57 does it support priorities? 2008-09-11 20:57 depends on the elevator 2008-09-11 20:58 mostly linux elevators are pretty crappy 2008-09-11 20:58 no good rt elevator for example 2008-09-11 20:58 if that's what you're asking 2008-09-11 20:58 yeah, something like that 2008-09-11 20:58 feel free to write a noncrappy one 2008-09-11 20:58 you're the man to do it 2008-09-11 20:59 if I'm operating on behalf of a user, and he's running at some prio, or asking for some priority on his read/write/file op, than I'd like to be able to pass that down to the blockdev layer 2008-09-11 20:59 yes, and save us from that broken pos that is the current io scheduler 2008-09-11 20:59 I mean I obviously shouldn't be dealing with that in the fs, except for making sure I submit requests with the right priorities 2008-09-11 20:59 no, not in the fs 2008-09-11 20:59 though one can imagine the fs making suggestions 2008-09-11 21:00 and a realtime fs most certainly has to interact with the io scheduler 2008-09-11 21:00 (submit_bio is used only by xfs, ocfs2, jfs, gfs2 and ext4) 2008-09-11 21:00 the fs also has to answer the question "can I submit this request at all, and meet the constraints" 2008-09-11 21:00 what about networking? how would you go about sending/receiving udp? tcp? raw frames? other protocol? 2008-09-11 21:00 (what do the others use?) 2008-09-11 21:01 only the fs can know certainly crucial information about those constrainnts 2008-09-11 21:01 networking? 2008-09-11 21:01 sorry, missed the connection 2008-09-11 21:01 you mean realtime? 2008-09-11 21:01 [have to be careful - low prio process fetches a directory, higher priority process than needs to fetch it again - needs to result in increasing the bio priority or resubmitting it or something] 2008-09-11 21:01 razvanm, notice that submit_bio is used in all _modern_ fs's 2008-09-11 21:01 networking connection - I'm imagining a disk and network based multi-node fs 2008-09-11 21:02 I'm imaginative ;-) 2008-09-11 21:02 gfs2 only loosely meeting that definition 2008-09-11 21:02 flips: right :D 2008-09-11 21:02 maze,you're already IO fixing priority inversion? 2008-09-11 21:03 ah 2008-09-11 21:03 right 2008-09-11 21:03 no, just pointing out you have to be careful 2008-09-11 21:03 that kind of networking 2008-09-11 21:03 you do 2008-09-11 21:03 and as a rule we are not 2008-09-11 21:03 far from it 2008-09-11 21:03 tcp/ip is not realtime 2008-09-11 21:03 however 2008-09-11 21:03 so there's a lot of things I'd like to work on if I had the time ;-) 2008-09-11 21:03 you can kinda sorta pretend it is, sometimes 2008-09-11 21:03 networking is real-time if you have caching done correctly ;-) 2008-09-11 21:03 right 2008-09-11 21:04 really? 2008-09-11 21:04 you will have to convince me of that 2008-09-11 21:04 I think that random backout already makes it not realtime 2008-09-11 21:04 CSMACD 2008-09-11 21:04 or something like that 2008-09-11 21:04 oh, ok, I don't mean RT as in rtlinux rt 2008-09-11 21:04 carrier sense multiple access collision detect 2008-09-11 21:04 I meant usable on a desktop 2008-09-11 21:04 ah 2008-09-11 21:04 I always mean actual rt when somebody says rt 2008-09-11 21:05 flips: if the data is represented identified by some hashes over them then it could be ;-) 2008-09-11 21:05 I meant usable and not get killed by background tasks 2008-09-11 21:05 there linus and I differ 2008-09-11 21:05 (for reading) 2008-09-11 21:05 uhm, I never mentioned rt ;-) 2008-09-11 21:05 razvanm, what could be? 2008-09-11 21:05 maze, ok 2008-09-11 21:05 sorry 2008-09-11 21:05 try again? 2008-09-11 21:05 too many threads :D 2008-09-11 21:05 yup 2008-09-11 21:05 while rt is nice of course, and you should design with making it possible in the future of course 2008-09-11 21:05 the phillips switch is overloading 2008-09-11 21:06 I just wanted fg tasks to be able to run at higher priority than bg tasks (a garbage collector or bg file scan or ...) 2008-09-11 21:06 ;-) 2008-09-11 21:06 ok, well a single node filesystem has no business knowing anything about networking 2008-09-11 21:06 right 2008-09-11 21:06 yes, you have control over that 2008-09-11 21:06 complete control 2008-09-11 21:06 you are root 2008-09-11 21:06 beyond root 2008-09-11 21:06 we've already determined that a fs has to provide some interfaces to the vfs layer, and it interfaces with the blockdev layer via bio's 2008-09-11 21:07 there's only one limitation to what a filesystem in linux can do: use symbols that are not exported to modules, when it is compiled as a module 2008-09-11 21:07 add in some atomics/locks/primitives already provided by the kernel and mem management, and you have all pieces ;-) 2008-09-11 21:07 yes 2008-09-11 21:07 watch out for layer violations 2008-09-11 21:07 but in general, go crazy 2008-09-11 21:08 so basically, now the question was: how to implement nfs - what would the interface not to blockdev, but to network, be? 2008-09-11 21:08 there is not much to do 2008-09-11 21:08 nfs basically runs on top of a filesystem that doesn't even have to know its there 2008-09-11 21:08 there are a few small, weird hooks 2008-09-11 21:08 uhm? 2008-09-11 21:08 the details of which I forget 2008-09-11 21:08 nfs stacks on top of a host fs 2008-09-11 21:09 the host fs doesn't have to know it's being stacked on 2008-09-11 21:09 it just have to behave itself 2008-09-11 21:09 like a unix fs 2008-09-11 21:09 what do you mean by host fs? oh you mean for the nfs server? 2008-09-11 21:09 that's actually pretty hard ;p) 2008-09-11 21:09 I was thinking about the nfs client 2008-09-11 21:09 right 2008-09-11 21:09 ah, nfs client 2008-09-11 21:09 strange exception to pretty much everything 2008-09-11 21:09 it stacks on top of a remote host fs 2008-09-11 21:10 with all the oddities that implies 2008-09-11 21:10 including mid-flight reboots 2008-09-11 21:10 indeed 2008-09-11 21:10 there are papers written about how much this sucks 2008-09-11 21:10 let me see 2008-09-11 21:10 http://www.cc.gatech.edu/classes/AY2007/cs4210_fall/papers/nfsOLS.pdf 2008-09-11 21:10 the reboot? yeah, that's terrible, but it can be done in a way that it would work 2008-09-11 21:10 marginally 2008-09-11 21:10 you'd detect remote server reboot and have to dump caches, etc... 2008-09-11 21:11 I've been living/breathing that for the last 3 years 2008-09-11 21:11 I know ;-) 2008-09-11 21:11 yes, but we don't 2008-09-11 21:11 it's pathetic 2008-09-11 21:11 nobody pays attention to statd 2008-09-11 21:11 except lockd 2008-09-11 21:11 no excuse 2008-09-11 21:11 oh, I'm not thinking about NFS, I hate NFS, I'm thinking about a networkfs 2008-09-11 21:11 sun braindamage 2008-09-11 21:11 and linux too, because we should have fixed it by now 2008-09-11 21:12 oh a real networkfs 2008-09-11 21:12 just trying to figure out what the layering is there vfs / networkfs (missing this interface layer) networking 2008-09-11 21:12 well, lustre is getting close 2008-09-11 21:12 oscfs2 also 2008-09-11 21:12 I'm sure you will crack that one 2008-09-11 21:12 will be fun to watch your progress 2008-09-11 21:12 in the meantime, goals with tux3 are modest 2008-09-11 21:12 I need more than 24 hours in a day 2008-09-11 21:13 that is: support nfs no worse than any other filesystem 2008-09-11 21:13 hopefully much better 2008-09-11 21:13 hehe 2008-09-11 21:13 ebiederm, thanks for visiting 2008-09-11 21:14 I hope we did not disappoint ;-) 2008-09-11 21:14 an OT question: why hg and not git? 2008-09-11 21:14 ok, it is back to the question of atom refcounting 2008-09-11 21:14 you been following the thread, maze? 2008-09-11 21:15 sorry, which thread? 2008-09-11 21:15 razvanm, hg is a lot more usable than git 2008-09-11 21:15 about mercurial? 2008-09-11 21:15 instand on 2008-09-11 21:15 maze, no, about xattr atoms 2008-09-11 21:15 ah, no. 2008-09-11 21:15 on the tux3 list 2008-09-11 21:15 should I? 2008-09-11 21:15 please 2008-09-11 21:15 you subscribed? 2008-09-11 21:16 glancing ;-) 2008-09-11 21:16 I think I subscribed you 2008-09-11 21:16 more xattr design details? 2008-09-11 21:16 right, and associated posts 2008-09-11 21:16 the parent of that is the root of that tree 2008-09-11 21:17 uhm, gmail doesn't do trees ;-) 2008-09-11 21:17 they should fix that 2008-09-11 21:17 :p 2008-09-11 21:17 it's only beta 2008-09-11 21:17 right, it's also slow... 2008-09-11 21:17 let me see 2008-09-11 21:17 I know, I run exim4 here and it's beyond fast 2008-09-11 21:17 it's scary 2008-09-11 21:18 so I'm a big fan of atoms, because the space saving can be extreme 2008-09-11 21:18 [Tux3] The long and short of extended attributes 2008-09-11 21:18 ah, I like the sound of that 2008-09-11 21:18 you probably want to support even more atoms for selinux... but then the code gets complex 2008-09-11 21:18 I've been doing a lot of introspecting about it 2008-09-11 21:19 so you have the easy solution - use no atoms 2008-09-11 21:19 always on the verge of mass deleting that code 2008-09-11 21:19 I know, but I also feel its lame 2008-09-11 21:19 and just store rep { string=string } 2008-09-11 21:19 no null's thanks ;-) 2008-09-11 21:19 ext3 is 8 bit clean 2008-09-11 21:19 but otherwise yes 2008-09-11 21:19 (mind you I'd actually store that in reversed order, at the front of the file, going backwards towards negative offsets) 2008-09-11 21:20 reccount, namecount, , 2008-09-11 21:20 have it stored the same way as the rest of the file data 2008-09-11 21:20 ? 2008-09-11 21:20 xattr1=value1 xattr2=value2 filecontent="hello" ==> 2008-09-11 21:21 sorry, I meant tux3 is 8 bit clean 2008-09-11 21:21 where are the negative offsets? 2008-09-11 21:21 oh I see 2008-09-11 21:21 2eulav=2rttax 1eulav=1rttax hello 2008-09-11 21:21 | offset 0 at [H] in hello 2008-09-11 21:21 demented ;-) 2008-09-11 21:21 interesting idea 2008-09-11 21:21 it means you don't have to implement it though ;-) 2008-09-11 21:22 well the page cache doesn't have negative offsets 2008-09-11 21:22 you'd have to store at the top of the index range 2008-09-11 21:22 that's a good idea 2008-09-11 21:22 it should work out fine 2008-09-11 21:22 means you can't quite have a 16 TB file on 32 bit linux though 2008-09-11 21:23 16 TB less the maximum size of attributes 2008-09-11 21:23 no, you shave it down by however many xattrs you have 2008-09-11 21:23 so maybe a few kilobytes - in the future maybe more... who knows 2008-09-11 21:23 ok, that's twisted enough for me 2008-09-11 21:23 in what sense is it twisted? 2008-09-11 21:23 works perfectly on 64 bit linux... probably find a couple of radix tree bugs 2008-09-11 21:24 eeking out a small simplification by using the other end of the address range 2008-09-11 21:24 twisted 2008-09-11 21:24 I like it 2008-09-11 21:24 right you have to be signedness clean, or you can offset everything by a zero offset constant 2008-09-11 21:24 right 2008-09-11 21:24 like I way 2008-09-11 21:24 probably turn up a couple core linux bugs there 2008-09-11 21:24 but worth doing just for that reason 2008-09-11 21:24 or you can even just store it like this 0:hello empty space for expansion reverse xattrs :-1 2008-09-11 21:25 since you have to support holes anyway... 2008-09-11 21:25 sure 2008-09-11 21:25 it allows us to treat xattrs more like file data in kernel 2008-09-11 21:25 that's a tux3 meme 2008-09-11 21:25 exactly 2008-09-11 21:25 so I like it 2008-09-11 21:26 it means xattr support in the fs on-disk image is basically free 2008-09-11 21:26 for now we have the "xcache" 2008-09-11 21:26 which is even faster to access than a page cache mapping page 2008-09-11 21:26 well 2008-09-11 21:26 hmm 2008-09-11 21:26 is it? 2008-09-11 21:26 somewhat 2008-09-11 21:26 I think it's mostly free 2008-09-11 21:26 gets close 2008-09-11 21:26 I was going to have separate btree for big xattrs 2008-09-11 21:27 and small ones go inthe inode, just like immediate file data 2008-09-11 21:27 (still imagining a world with just one btree) 2008-09-11 21:27 but mapping intermediate sized attributes into the top of the file address space is a possibility 2008-09-11 21:27 theoretically you can put almost all file metadata at the -1 point 2008-09-11 21:27 not only xattrs 2008-09-11 21:27 thejust one btree idea has already been done, it's called hammer 2008-09-11 21:27 not sure how that would work for performance 2008-09-11 21:28 but you'd get versioning for free 2008-09-11 21:28 I think that two level btree is significantly more cache efficient 2008-09-11 21:28 I've played with mapping file metadata into the file address space before 2008-09-11 21:28 perhaps. 2008-09-11 21:28 without joy 2008-09-11 21:28 spent a lot of mental energy on it, found no real wins 2008-09-11 21:28 where are the problems? 2008-09-11 21:29 finding a reason to do it 2008-09-11 21:29 an example that runs faster 2008-09-11 21:29 yeah, it's probably worth optimizing the hell out of inode stat time 2008-09-11 21:29 stat time? 2008-09-11 21:29 ah 2008-09-11 21:30 yes 2008-09-11 21:30 how fast you can stat a bunch of inodes 2008-09-11 21:30 tux3 is going to work very well there 2008-09-11 21:30 basically just run down the inode table 2008-09-11 21:30 wait a minute, it's a table? not a btree? 2008-09-11 21:30 and the inode table will be intitionally laid out in a clumpy way 2008-09-11 21:30 it's a btree 2008-09-11 21:30 oh, ok. 2008-09-11 21:30 call it a table for historical reasons 2008-09-11 21:31 variable size inodes 2008-09-11 21:31 a tux3 exclusive, maybe 2008-09-11 21:31 really defines the design and implementation 2008-09-11 21:31  2) Refcount all atoms and delete any that fall to zero <- my vote 2008-09-11 21:31 mine too 2008-09-11 21:31 just challenging to do as fast as the crude approach 2008-09-11 21:31 possibly delaying cleanup till unmount, not sure if that would ease up anything though 2008-09-11 21:32 tux3 has the concept of log rollup 2008-09-11 21:32 I'll be posting about that in much more detail over the next week or so 2008-09-11 21:32 it's continuous cleanup 2008-09-11 21:32 doesn't have to be a flurry of cleanup wither at umount or mount 2008-09-11 21:32 or remount after crash even 2008-09-11 21:33 you can actually put it in the btree ;-) 2008-09-11 21:33 why? 2008-09-11 21:33 you want search through it to be efficient - both ways 2008-09-11 21:33 oh right 2008-09-11 21:33 both atom -> string conversion and string -> atom conversion 2008-09-11 21:33 interesting idea 2008-09-11 21:33 oh 2008-09-11 21:33 I thought you meant the log 2008-09-11 21:33 have some reserved btree prefix 2008-09-11 21:34 of course the atom table will be a btree 2008-09-11 21:34 it will be an HTree in facrt 2008-09-11 21:34 fact 2008-09-11 21:34 the log? yeah though about how the log could be in the btree 2008-09-11 21:34 even had some half-baked concept, but didn't think about it long enough to really know if that's worth even thinking about 2008-09-11 21:34 turns out that the deficiencies of HTree that make it tough to implement readdir accurately don't apply at all to the xattr atom use case 2008-09-11 21:35 and htree is just about optimal for that 2008-09-11 21:35 atom->string is just an array, since there's no holes 2008-09-11 21:35 as far as reverse conversion goes... 2008-09-11 21:35 there are two ideas I'm considering 2008-09-11 21:35 one is to use the address of the dirent as the atom number 2008-09-11 21:36 this decreases thedensity of the atom space somewaht 2008-09-11 21:36 huh, how does that work? 2008-09-11 21:36 by a factor of 4 to be precise 2008-09-11 21:36 oh, right, I think I see 2008-09-11 21:36 just look up the dirent and return the offset fromthe beginning of the file as the atom number 2008-09-11 21:36 have the atoms themselves be pointers 2008-09-11 21:36 cute 2008-09-11 21:36 ACTION has to put a different keyboard onthis machine with a better space bar 2008-09-11 21:36 right 2008-09-11 21:37 the other option is to have a reverse lookup table, that points back at the dirents 2008-09-11 21:37 potentially div 4 or something to make em more likely to fit in a byte 2008-09-11 21:37 I favor the second 2008-09-11 21:37 because I like the atoms to be as dense as possible 2008-09-11 21:37 for compression reasons 2008-09-11 21:37 I already took the div4 into account ;-) 2008-09-11 21:37 I'm still not convinced compression of this part of the fs really matters... 2008-09-11 21:38 sure it does 2008-09-11 21:38 atom number field is current 16 bits 2008-09-11 21:38 64K atoms 2008-09-11 21:38 before having to go to a 32 bit atom number 2008-09-11 21:38 that's comfortable 2008-09-11 21:38 -!- stargazr5(~gauravstt@59.95.38.255) has joined #tux3 2008-09-11 21:38 14 bits not so much 2008-09-11 21:38 still 2008-09-11 21:39 could go either way on that 2008-09-11 21:39 Terrible hack: 2008-09-11 21:39 $ getfattr -n user.hash -e text -h --absolute-names -L xhash 2008-09-11 21:39 # file: xhash 2008-09-11 21:39 user.hash="1114234:1191:1219215805:e233bf8dd0415ec9b7fea0193803357c:6325f0060bd5f23cf6ba106fd6500efa76d9bc5e" 2008-09-11 21:39 Storing mtime/md5sum/sha1sum in a xattr for fs recovery ;-) 2008-09-11 21:39 got to decide by midnight ;-) 2008-09-11 21:39 ? 2008-09-11 21:40 so I store the mtime:md5sum:sha1sum of each file on my drive in a xattr for that file 2008-09-11 21:40 I get constant time md5sum calculation on files 2008-09-11 21:40 ease of verifying file integrity 2008-09-11 21:40 cool 2008-09-11 21:40 and I can verify integrity of files in case of fs crash (ie. like when I upgraded to 2.6.27-rc3) 2008-09-11 21:41 I think I like it more than zfs "checksum everything" mentality 2008-09-11 21:41 makes sense to only checksum logically 2008-09-11 21:41 and yes it does need to be regenerated on file modifications, so the newest files lack it 2008-09-11 21:41 sha1 is ok only if you want crytographic verifiability, otherwise it's slower than necessary 2008-09-11 21:42 compare that with my laptop 20mb/s read speed... 2008-09-11 21:42 and it doesn't matter 2008-09-11 21:42 it matters if you're running a server 2008-09-11 21:42 a lot 2008-09-11 21:42 true 2008-09-11 21:42 option? 2008-09-11 21:42 right 2008-09-11 21:42 probably include something like crc64 or whatever cheap 64-bit hash you can find 2008-09-11 21:43 (no idea what a fast good 64-bit hash is nowadays) 2008-09-11 21:43 crc is bad 2008-09-11 21:43 funnels to hell 2008-09-11 21:43 dx_hack_hash is getting closer 2008-09-11 21:43 uses a hacked lfsr idea 2008-09-11 21:43 needs analysis 2008-09-11 21:44 maze, you'd be good at that 2008-09-11 21:44 I think 2008-09-11 21:44 we appear to have scared of everybody else... noone is asking any other questions ;-) 2008-09-11 21:44 yeah 2008-09-11 21:44 analysis of speed? or of hash spread? 2008-09-11 21:44 and they're the ones who actually check in code ;-) 2008-09-11 21:45 got to be careful about that 2008-09-11 21:45 hash spread 2008-09-11 21:45 etc 2008-09-11 21:45 I'm in the middle of a cluster turn up... 2008-09-11 21:45 speed is about optimal 2008-09-11 21:45 I made sure of that 2008-09-11 21:45 well 2008-09-11 21:45 truth be told I could make it much faster 2008-09-11 21:45 hopefully I can at least provide 'inspiration' or something 2008-09-11 21:45 it's meant for hashing short strings with good spread 2008-09-11 21:46 short, very nonrandom strings 2008-09-11 21:46 does a good job of that 2008-09-11 21:46 re: atom refcounting 2008-09-11 21:46 you don't have to sync it to disk really if you are hacky/smart about it 2008-09-11 21:47 I'm going to post the results of my design thinking from the skate earlier 2008-09-11 21:47 really? 2008-09-11 21:47 since you can put it in the log 2008-09-11 21:47 sounds like magic 2008-09-11 21:47 of course 2008-09-11 21:47 planned 2008-09-11 21:47 or I wouldn't have gone this route at all 2008-09-11 21:47 and if the order is right, then it can never get out of sync 2008-09-11 21:47 again, of course 2008-09-11 21:47 and the entire thing should be small enough you can periodically just write out a new copy 2008-09-11 21:47 I've been computing the exact percentages of log bandwdith that will be required ;-) 2008-09-11 21:47 of the entire thing 2008-09-11 21:48 again, of course 2008-09-11 21:48 but we don't 2008-09-11 21:48 we even do that incrementally 2008-09-11 21:48 and: you can afford to lose decrements, since at most the ref counts will be too high 2008-09-11 21:48 and arrange the structure that have to be updated to be close together 2008-09-11 21:48 and compact 2008-09-11 21:48 ah 2008-09-11 21:48 which is kind of dirty... 2008-09-11 21:48 really? 2008-09-11 21:48 way dirty 2008-09-11 21:48 but there is likely something there 2008-09-11 21:49 you can't lose track long term 2008-09-11 21:49 that would be bad 2008-09-11 21:49 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 21:49 but you can do a false-positive-to-be-tested-later kind of thing 2008-09-11 21:49 I'd guess most fs'es will have 2 dozen or less atoms 2008-09-11 21:49 he tim_dimm 2008-09-11 21:49 welcome back, daddy! 2008-09-11 21:49 hey 2008-09-11 21:49 wassap? 2008-09-11 21:49 well 2008-09-11 21:50 they're doing good 2008-09-11 21:50 we just did episode 2 of tux3 university 2008-09-11 21:50 how'd I do, maze? 2008-09-11 21:50 ah, missed it! 2008-09-11 21:50 keeping your interesting, hit the right level? 2008-09-11 21:50 ok, though I think the first was more action packed 2008-09-11 21:50 enough swear words? too many? 2008-09-11 21:50 hehe 2008-09-11 21:50 well I can easily pick up the pace 2008-09-11 21:50 it's just that, where we were is where the tux3 kernel port willa actully start 2008-09-11 21:51 I wish there was a: these are your primitives, this is how they function, know this and C and data structures and you don't need to know anything else linux specific 2008-09-11 21:51 can shapor put together one of those word clouds for tux3 university? 2008-09-11 21:51 in an ideal world 2008-09-11 21:51 word cloud? 2008-09-11 21:51 ah 2008-09-11 21:51 scrape the logs 2008-09-11 21:51 make a book ;-) 2008-09-11 21:51 you know, a clear definition of interfaces ;-) 2008-09-11 21:51 you know, the more common words are bigger 2008-09-11 21:51 full of swear words 2008-09-11 21:52 and embarrassing stories about certain well known kernel hackers 2008-09-11 21:52 who joined the U? 2008-09-11 21:52 maze, good louck with that 2008-09-11 21:52 bunch of people up there in the channel list 2008-09-11 21:52 good folks 2008-09-11 21:52 I'm seeing a lot of new names on the irc 2008-09-11 21:52 missed natalie tonight 2008-09-11 21:52 yes, it's bulking up 2008-09-11 21:52 so is the subscribe list 2008-09-11 21:52 and so are the checkins 2008-09-11 21:53 can ask for more 2008-09-11 21:53 what's that up to? 2008-09-11 21:53 you may want to have something selinux specific to optimize that straight in the metadata 2008-09-11 21:53 same for acl's 2008-09-11 21:53 tux3 subscribers are over 100 2008-09-11 21:53 a dictionary with a few hundred entries would optimize down all selinux entries on my machine down to 3 bytes 2008-09-11 21:53 MaZe: you refering to the xattrs? 2008-09-11 21:53 maze, I'm waiting for the selinux people to smell the coffee and come tell use what we're doing right/wrong 2008-09-11 21:53 maze, I have reason to believe that will happen soon ;-) 2008-09-11 21:54 problem is: dictionary needs to be dynamic 2008-09-11 21:54 so basically atoms x 2 2008-09-11 21:54 maze, just what I'm thinking 2008-09-11 21:54 read the posts 2008-09-11 21:54 but again, that's kind of vital to be fast 2008-09-11 21:54 you'll see I addressed that specifically 2008-09-11 21:54 you know where I work, you know how much email I get 2008-09-11 21:54 ;-) 2008-09-11 21:54 the last thing I want to do when I'm 'done' with work is read more email 2008-09-11 21:54 the more details post talks about it I think 2008-09-11 21:55 well read it on the job 2008-09-11 21:55 still parsing (in 2nd window) 2008-09-11 21:55 everybody knows sre's have that kind of time ;-) 2008-09-11 21:55 yeah... 2008-09-11 21:55 when there are no data centers burning down 2008-09-11 21:56 it's job-related 2008-09-11 21:56 so xattr's don't need to be fast - (overly fast) - for anything - except for the parts that are actually used within the kernel 2008-09-11 21:56 got to keep the finger on the pulse of sel development 2008-09-11 21:56 ie. acl's and selinux 2008-09-11 21:56 yup 2008-09-11 21:56 got that in my post too 2008-09-11 21:56 selinux is on every file, acl only on special files 2008-09-11 21:56 haven't gotten there obviously yet 2008-09-11 21:57 the xattr interface kind of sucks for efficiency 2008-09-11 21:57 choke point 2008-09-11 21:58 another solution, is to support atoms for the important stuff, and leave the rest as strings 2008-09-11 21:58 that way you compress selinux/acl but leave the user stuff uncompressed 2008-09-11 21:59 don't have to deal with denial of service against atom space attacks 2008-09-11 21:59 kind of best of both worlds 2008-09-11 21:59 possibly allow a superblock list of optimized entries, and a utility (mount time option), to include a new atom 2008-09-11 22:00 that might be both simple (very) and efficient and trivial to implement 2008-09-11 22:00 flips: see the latest list post? 2008-09-11 22:01 -!- ebiederm(~eric@c-24-130-11-59.hsd1.ca.comcast.net) has left #tux3 2008-09-11 22:01 you still have to deal with negative lookups correctly, but that's an easy optimization 2008-09-11 22:01 maze, also an option, yes 2008-09-11 22:01 important = atom 2008-09-11 22:02 exactly 2008-09-11 22:02 negative lookups? 2008-09-11 22:02 and what is actually an atom is specified by the admin during (previous and current) mount 2008-09-11 22:02 konrad,not yet 2008-09-11 22:02 if you have fs with 'selinux' not being an atom 2008-09-11 22:02 and have that written out to disk as a string 2008-09-11 22:03 and then you remount with selinux as atom (so it promotes) 2008-09-11 22:03 then if you lookup selinux atom on a file with it from before the remount you won't find it, unless you search the string entries as well 2008-09-11 22:03 in which case lack of the field, must mean search for the string instead and promote if needed to atom 2008-09-11 22:04 but the atom table is part of the fs, so how does remoutn come into it? 2008-09-11 22:04 the atom table can't be shrunk 2008-09-11 22:04 but can have new entries added via mount options 2008-09-11 22:04 I sort of get it 2008-09-11 22:04 ie. mount -o atomize=selinux -t tux3 /dev/hda3 / 2008-09-11 22:04 would be worth a list post maybe 2008-09-11 22:05 and than at the beginning you atomize (awesome term) the entries you know will be common 2008-09-11 22:05 so the security.selinux 2008-09-11 22:05 anyway, I think the truth is, the refcounting is going to be so efficient that nobody will care about the slight overhead and will love the warm fuzzy feeling of compression 2008-09-11 22:05 the subpieces of security selinux (since it's split in 3 parts) 2008-09-11 22:05 and being able to use long xattr names without penalty 2008-09-11 22:05 refcounting does have issues with quota 2008-09-11 22:06 unless you count xattrs against user quota 2008-09-11 22:06 which you probably should... 2008-09-11 22:06 yes, the refcounting is primarily to address quota 2008-09-11 22:06 my solution has the benefit you don't need refcounts 2008-09-11 22:06 you still get optimal performance for anything that matters 2008-09-11 22:07 - what matters being selected by the admin (and you can compile in a list of default atoms into tux3, being what we grab from selinux in fedora or whatever) 2008-09-11 22:07 do acl's store numeric ids or text strings? 2008-09-11 22:07 (ids = uids/gids) 2008-09-11 22:07 I'd hope numeric... 2008-09-11 22:09 anyway, that way you should be able to store all selinux data straight along with the mtime/ctime in the inode, using up a 32bit int or something like that 2008-09-11 22:12 btw, you're wrong on the ACLs being the most important use of xattr's - selinux is _BY FAR_ 2008-09-11 22:14 maze, could you work it up into a post? 2008-09-11 22:14 btw 2008-09-11 22:14 if you look at man getfacl, you'll see: 2008-09-11 22:14 1: # file: somedir/ 2008-09-11 22:14 2: # owner: lisa 2008-09-11 22:14 3: # group: staff 2008-09-11 22:14 4: user::rwx 2008-09-11 22:14 5: user:joe:rwx #effective:r-x 2008-09-11 22:14 6: group::rwx #effective:r-x 2008-09-11 22:14 7: group:cool:r-x 2008-09-11 22:14 8: mask:r-x 2008-09-11 22:14 9: other:r-x 2008-09-11 22:14 10: default:user::rwx 2008-09-11 22:14 11: default:user:joe:rwx #effective:r-x 2008-09-11 22:14 I hope acl's are binary but I don't know yet 2008-09-11 22:14 12: default:group::r-x 2008-09-11 22:14 13: default:mask:r-x 2008-09-11 22:14 14: default:other:--- 2008-09-11 22:15 from which you'll notice that for permissions you want to fit the standard ugo+-rwx in the inode 2008-09-11 22:15 but also 2008-09-11 22:15 some of the other stuff which isn't per used 2008-09-11 22:15 s/used/user/ 2008-09-11 22:15 I can't tell if the other latest list post is spam or not 2008-09-11 22:15 so what is going to be cool is also doing atoms on the acl bodies 2008-09-11 22:15 which one? 2008-09-11 22:16 the mask on line 8 in particular 2008-09-11 22:16 "sir I want join your proyect" 2008-09-11 22:16 not spam 2008-09-11 22:16 definitely 2008-09-11 22:16 they do some clever spam nowadays 2008-09-11 22:16 directories also need the default ACLs off of lines 10-14 2008-09-11 22:16 konrad, you meant the latest from tero? 2008-09-11 22:16 which I think don't need special handling 2008-09-11 22:17 flips: nanden yen 2008-09-11 22:17 Not Tero 2008-09-11 22:17 yes, the best kind 2008-09-11 22:17 the problem with acl's though is that of course the amount of space they can take up is unbounded 2008-09-11 22:17 I'll respond 2008-09-11 22:17 maze, sure 2008-09-11 22:17 so only atomize the short ones 2008-09-11 22:17 although you can get it down to like 40 bits per entry 2008-09-11 22:18 so acl's don't really need atomization per say 2008-09-11 22:18 selinux needs atomization 2008-09-11 22:18 ah 2008-09-11 22:18 because selinux stores arbitrary strings 2008-09-11 22:18 I didn't realize the distinction 2008-09-11 22:19 it has to do compartments and stuff 2008-09-11 22:19 acl's basically store the above listed fields (user, user:uid, group, group:gid, mask, other) with 3 bits (rwx) and possibly the actual uid/gid (32 bits) 2008-09-11 22:19 that's not an acl as I understand it 2008-09-11 22:20 that is DAC 2008-09-11 22:20 course getfacl is definitive ;-) 2008-09-11 22:20 $ getfattr -d -m . -e text -h --absolute-names -L /etc 2008-09-11 22:20 # file: /etc 2008-09-11 22:20 security.selinux="system_u:object_r:etc_t:s0\000" 2008-09-11 22:20 so there you have an example selinux xattr 2008-09-11 22:20 the system_u object_r and etc_t are going to appear on tens of thousands of files 2008-09-11 22:20 -!- tim_dimm_(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 22:20 and come from a set of like maybe 200 entries on my system 2008-09-11 22:21 it's going to be very satisfying to compress "security.selinux" to 2 bytes 2008-09-11 22:21 so you need to have security.selinux xattr stored as a 32 bit field in the inode 2008-09-11 22:21 storing 10 bits per each of the 3 fields (u/r/t) 2008-09-11 22:21 and having a dictionary for each of those fields 2008-09-11 22:22 [actually 4 bytes for the entire thing most likely] 2008-09-11 22:22 we could build selinux fields directly into tux3 inodes if they're decent 2008-09-11 22:22 all tux3 attributes are optional 2008-09-11 22:22 exactly - you really want to do that 2008-09-11 22:22 so there's no real cost 2008-09-11 22:22 I guess we better do that 2008-09-11 22:22 could you write a post asking for that? 2008-09-11 22:22 and explaining what they are? :-) 2008-09-11 22:22 heh 2008-09-11 22:22 I'd need to run some stats gathering on my local system 2008-09-11 22:23 sure 2008-09-11 22:23 that's a yes I take it 2008-09-11 22:23 ACTION considers the ramyun deficiency problem 2008-09-11 22:24 ok, generating xattr dump from my machine 2008-09-11 22:24 I'll write something up 2008-09-11 22:24 about selinux/acl/other xattrs 2008-09-11 22:24 and what's important 2008-09-11 22:24 kay, I'll get some ranyun into me then prototype the refcounting 2008-09-11 22:24 and include something about my md5/sha1 idea above into it 2008-09-11 22:24 yah 2008-09-11 22:24 to point out why you want it to be user extensible 2008-09-11 22:25 nice 2008-09-11 22:25 for selinux you can even refuse to accept stuff from outside the dictionary 2008-09-11 22:26 although to be fair that's only settable by root, so might as well just transparently extend the dicts 2008-09-11 22:29 we can have some fun 2008-09-11 22:29 letting security folks play with stuff 2008-09-11 22:29 btw, here's a file with both selinux attrs, and extended acls (extra read rights to local user) 2008-09-11 22:29 $ getfattr -d -m . -e text -h --absolute-names -L junk 2008-09-11 22:29 # file: junk 2008-09-11 22:29 security.selinux="unconfined_u:object_r:default_t:s0\000" 2008-09-11 22:29 system.posix_acl_access="\002\000\000\000\001\000\007\000\377\377\377\377\002\000\004\000d\000\000\000\004\000\005\000\377\377\377\377\020\000\005\000\377\377\377\377 \000\005\000\377\377\377\377" 2008-09-11 22:29 Would you have guessed? 2008-09-11 22:30 everybody needs to have fun 2008-09-11 22:30 ah 2008-09-11 22:30 looks binary 2008-09-11 22:30 notice how 'local' (a user name) doesn't show up 2008-09-11 22:30 very 2008-09-11 22:30 instead the uid (100) does 2008-09-11 22:31 where is it? 2008-09-11 22:31 100 = 0144 = 'd' 2008-09-11 22:32 (there's extra stuff there that always gets set on any file with extended acls... basically cruft...) 2008-09-11 22:32 there sure are a lot of 377s 2008-09-11 22:32 it looks like it uses 32-bit ints 2008-09-11 22:33 probably for the u/gids 2008-09-11 22:33 wow, that d is really well hidden 2008-09-11 22:33 hehe 2008-09-11 22:33 bad choice of username 2008-09-11 22:33 crappy dump 2008-09-11 22:33 see hexdump.c 2008-09-11 22:34 yeah ;-), well I asked for text 2008-09-11 22:34 $ getfattr -d -m . -e hex -h --absolute-names -L junk 2008-09-11 22:34 # file: junk 2008-09-11 22:34 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000 2008-09-11 22:34 system.posix_acl_access=0x0200000001000700ffffffff020004006400000004000500ffffffff10000500ffffffff20000500ffffffff 2008-09-11 22:34 still not readable - but better 2008-09-11 22:34 just installed the acl package 2008-09-11 22:34 I will try to clue up a bit 2008-09-11 22:34 the fs has to be mounted with acl support 2008-09-11 22:35 basically all you care about are getfattr/setfattr for xattr mods 2008-09-11 22:35 tux3 will always mount it xattr support, anyway 2008-09-11 22:35 and getfacl/setfacl for acl stuff 2008-09-11 22:35 I wonder that acl support is 2008-09-11 22:35 extra module? 2008-09-11 22:35 to actually respect extended acls stored as xattrs 2008-09-11 22:35 $ cat /proc/mounts 2008-09-11 22:35 /dev/root / ext3 rw,relatime,errors=continue,user_xattr,acl,data=ordered 0 0 2008-09-11 22:35 notice user_xattr,acl 2008-09-11 22:36 I wonder what acl does 2008-09-11 22:36 user xattr allows setting user.* xattrs (not selinux nor acl) 2008-09-11 22:36 could always read the fscking source 2008-09-11 22:36 hehe 2008-09-11 22:36 man 5 attr 2008-09-11 22:36 ah, well tux3 will yet users set xattrs by default 2008-09-11 22:36 man 5 acl 2008-09-11 22:36 no reason not to 2008-09-11 22:37 I meant, what does it do in ext3 mount 2008-09-11 22:37 uhm, I think that's actually a vfs switch - so you have no choice ;-) 2008-09-11 22:37 it seems strange you'd have to ask for it 2008-09-11 22:37 notice you don't have to ask for htree ;-) 2008-09-11 22:37 I think it's the default nowadays 2008-09-11 22:37 that's a pretty big deal 2008-09-11 22:37 not sure though 2008-09-11 22:37 what is? 2008-09-11 22:37 not having to ask for htree 2008-09-11 22:38 you can mount -o remount,acl/noacl 2008-09-11 22:38 you get dir indexing by default 2008-09-11 22:38 even though it's horribly complex 2008-09-11 22:38 you've lost me - how does user_xattr and acl correspond to htree/dir indexing? 2008-09-11 22:38 doesn't 2008-09-11 22:38 ah 2008-09-11 22:38 just talking about defaults 2008-09-11 22:39 it makes no sense you'd have to ask for xattr or acl 2008-09-11 22:39 if you want to prevent you users from mucking around with it 2008-09-11 22:39 why would you? 2008-09-11 22:39 I believe the reason is 2008-09-11 22:40 (mind you both are ext3 options, not vfs I believe) 2008-09-11 22:40 true 2008-09-11 22:40 it requries a newer version of the ext3 superblock 2008-09-11 22:40 so does htree 2008-09-11 22:40 so you need it for backward compatibility 2008-09-11 22:40 ah 2008-09-11 22:40 that is the difference 2008-09-11 22:40 htree was forward compatible 2008-09-11 22:40 in case you don't want to generate xattrs on the fs 2008-09-11 22:40 right 2008-09-11 22:40 now it makes sense 2008-09-11 22:40 well 2008-09-11 22:40 tux3 doesn't have that problem 2008-09-11 22:40 forward? or backward? 2008-09-11 22:41 backward 2008-09-11 22:41 there is no backward, but of course we need to plan for forward 2008-09-11 22:41 could an old system r/w a newer fs with data stored in htreE? 2008-09-11 22:41 yes 2008-09-11 22:41 cute, no? 2008-09-11 22:41 very tricky to make that happen 2008-09-11 22:41 ah, ok, then clearly need no option if it's better 2008-09-11 22:41 cute - yes! 2008-09-11 22:41 right, it was never worse, that was another cute thing 2008-09-11 22:41 wicked. 2008-09-11 22:42 because it would fall back to _exactly_ the old code at the crossover point 2008-09-11 22:42 hehe 2008-09-11 22:42 which turned out to be two dirent blocks 2008-09-11 22:42 at two blocks htree was already faster 2008-09-11 22:42 so it just creates the index when the first block overfills 2008-09-11 22:43 and at that point that's still a cheap op 2008-09-11 22:44 right 2008-09-11 22:44 htree is really fast 2008-09-11 22:45 dirops measured in tens of usec, even back then 2008-09-11 22:46 ugh, I wish I could do some coding... some real low level put-nose-in-the-deep low-level hackery 2008-09-11 22:47 you can, after your post ;-) 2008-09-11 22:47 that in itself will take a while to write 2008-09-11 22:48 it's job-related 2008-09-11 22:49 good security keeps data centers from catching fire 2008-09-11 22:50 uhm, not so sure about that ;-) 2008-09-11 22:50 they catch fire for totally non-security related reasons 2008-09-11 22:51 ok, _sometimes_ keeps data centers from catching fire 2008-09-11 22:51 could potentially, one day, keep a fire from starting 2008-09-11 22:52 you know alan cox figure out how to remotely disable the temperture override on intel processors? 2008-09-11 22:52 he could literally melt down processors remotely 2008-09-11 22:52 let me see 2008-09-11 22:53 make sure that isn't apocryphal 2008-09-11 22:53 pretty sure not 2008-09-11 22:55 heh 2008-09-11 22:56 I think our machines would lose power at that point, although I'm not sure 2008-09-11 22:56 care to let him try? ;-) 2008-09-11 22:56 it's rather odd that you can do that in software 2008-09-11 22:57 you'd think: overheat is overheat 2008-09-11 22:57 there should be no gate on it 2008-09-11 22:57 I think they have gates on stuff like that for a simple reason 2008-09-11 22:58 they don't know where the overheat point is until after they've built and tested the cpu 2008-09-11 22:58 some batches are better than others 2008-09-11 22:58 those get sold as higher frequency cpus, and or with more cache 2008-09-11 22:59 today faster/more expensive/more top of the line cpus are cpus with less broken parts 2008-09-11 22:59 less of the cache disabled - because it didn't work, less power consumption at higher speed, less heat generated, better freqeuency tolerances etc 2008-09-11 23:00 less alu units disabled (there are always spares) 2008-09-11 23:00 less cores disabled 2008-09-11 23:00 already the 486sx was a dx with the floating point unit disabled because it failed qa 2008-09-11 23:01 -!- flips(~phillips@phunq.net) has left #tux3 2008-09-11 23:01 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-11 23:01 creating some excellent virus payload opportunities 2008-09-11 23:01 huh? 2008-09-11 23:02 course, it's usually not in the interest of a virus to physicall destroy its host 2008-09-11 23:02 oh, as in burn the cpu virus? 2008-09-11 23:02 right, a real HACF instruction 2008-09-11 23:02 spread, mutate, exterminate later? 2008-09-11 23:02 HACF? 2008-09-11 23:02 Halt And Catch Fire 2008-09-11 23:02 $ cat /tmp/xattr.se | wc -l 2008-09-11 23:02 383 2008-09-11 23:03 so I have 383 different security.selinux xattr entries on my laptop 2008-09-11 23:03 highly compressable if that's what you're syaing 2008-09-11 23:03 yup 2008-09-11 23:03 MaZe: how'd you count? 2008-09-11 23:04 sort | uniq -c | wc -l 2008-09-11 23:04 er, before that bit 2008-09-11 23:04 this is going to hurt your eyes ;-) 2008-09-11 23:04 find / | xargs getxattr (something)? 2008-09-11 23:04 find / -xdev -print0 | xargs -0 -n 1 getfattr -d -m . -e text -h --absolute-names -L | egrep '^security\.selinux=' | sort | uniq -c | wc -l 2008-09-11 23:04 thanks 2008-09-11 23:05 relatively unpainful line of shell 2008-09-11 23:05 I actually dumped to file in their 2008-09-11 23:05 no perl or awk ;-) 2008-09-11 23:05 so hope I didn't mix that up 2008-09-11 23:05 don't know perl 2008-09-11 23:05 avoid awk 2008-09-11 23:05 abuse egrep and sed instead 2008-09-11 23:05 sed is good for some nice chicken tracks 2008-09-11 23:05 base64 to base16 conversion in sed anyone? 2008-09-11 23:06 ouch, really? 2008-09-11 23:06 or the other way? 2008-09-11 23:06 or base32? 2008-09-11 23:06 or to binary? 2008-09-11 23:06 really ;-) 2008-09-11 23:06 ow :) 2008-09-11 23:07 (obviously you go through binary) 2008-09-11 23:08 so I split up my selinux strings, here's the results (there's 4 : seperated pieces) 2008-09-11 23:08 $ cat /tmp/xattr.se1 2008-09-11 23:08 679862 system_u 2008-09-11 23:08 24514 unconfined_u 2008-09-11 23:08 $ cat /tmp/xattr.se2 2008-09-11 23:08 704376 object_r 2008-09-11 23:08 $ cat /tmp/xattr.se3 | wc -l 2008-09-11 23:08 356 2008-09-11 23:08 $ cat /tmp/xattr.se4 2008-09-11 23:08 704376 s0 2008-09-11 23:09 so, yeah, badly needs a dict 2008-09-11 23:09 how about base63 to base15? 2008-09-11 23:09 just one less 2008-09-11 23:09 lol 2008-09-11 23:09 how hard could that be 2008-09-11 23:10 ah, but how useful would that be? 2008-09-11 23:10 ACTION feels that way about base64 2008-09-11 23:11 base64 is nice and concise, when you still want something printable (ie. a filename) 2008-09-11 23:11 but doesn't need to be remembered (still needs to be cut'n'pasteable though) 2008-09-11 23:11 see, I just didn't arrive on the planet as a sysop 2008-09-11 23:11 so yeah files with names being base64 encoding of content hash... 2008-09-11 23:12 useful stuff ;-) 2008-09-11 23:12 good thing lots of sysops like to hang around filesystem projects 2008-09-11 23:12 makes for some fancy scripts 2008-09-11 23:12 $ cat /tmp/xattr.se3 | sort -n | tail -n 8 2008-09-11 23:12 6125 modules_object_t 2008-09-11 23:12 13658 locale_t 2008-09-11 23:12 16690 man_t 2008-09-11 23:12 25337 src_t 2008-09-11 23:12 26456 lib_t 2008-09-11 23:12 32875 user_home_t 2008-09-11 23:12 109266 usr_t 2008-09-11 23:12 461436 default_t 2008-09-11 23:13 so you can see the long tail distribution of the 3rd element 2008-09-11 23:13 that's with all the _t ? 2008-09-11 23:13 type 2008-09-11 23:13 right, which I associate with source code 2008-09-11 23:13 or somit 2008-09-11 23:13 is that source? 2008-09-11 23:13 u = user 2008-09-11 23:13 r = role 2008-09-11 23:14 t = type 2008-09-11 23:14 ah 2008-09-11 23:14 parts of the tripplet have different suffixes 2008-09-11 23:14 it's basically always .*_u:.*_r:.*_t 2008-09-11 23:14 let me see, that's MAC terminology 2008-09-11 23:14 not surprising there... 2008-09-11 23:14 not sure what exactly the last quad (s0) is 2008-09-11 23:14 possibly range? 2008-09-11 23:15 anyway, on a bigger box, there would obviously be more 2008-09-11 23:15 but we're not talking about a lot here - just the order of hundreds to thousands 2008-09-11 23:16 instructive 2008-09-11 23:16 I'm going to have to aborb this as we go 2008-09-11 23:16 but anyway 2008-09-11 23:16 I think the last quad isn't used for anything yet but is reserved for future stuff? 2008-09-11 23:16 seem to be on the right track, just by dumb luck 2008-09-11 23:16 maybe, it's newer than the rest 2008-09-11 23:16 err 2008-09-11 23:16 no, that's not right 2008-09-11 23:16 hold on 2008-09-11 23:17 oh, I know 2008-09-11 23:17 if user sets a selinux context which isn't atomized - he gets eperm, if root, the atom table is auto-extended 2008-09-11 23:18 deals with dos correctly 2008-09-11 23:18 or make it a mount option, selinux-auto-atomize={always,if-root,never} 2008-09-11 23:21 atomize is the new tux3 thing? 2008-09-11 23:24 215 different selinux states here 2008-09-11 23:25 :-) 2008-09-11 23:25 well atomize is descriptive in what it does to the number of bytes 2008-09-11 23:25 heh 2008-09-11 23:25 what is the average length of an acl? 2008-09-11 23:26 and what is the percentage of files that have them? 2008-09-11 23:26 selinux? all 2008-09-11 23:26 extended acl? almost none 2008-09-11 23:26 avg length of selinux acl... working 2008-09-11 23:27 security.selinux="...:...:...:..\000"[newline] - avg at 53 2008-09-11 23:27 so like 48 in memory 2008-09-11 23:28 extended acl: 2008-09-11 23:28 system.posix_acl_acces="..." with minimum length of 2008-09-11 23:29 so we will compress those about 25/1 2008-09-11 23:29 61 or so 2008-09-11 23:29 30/1 then 2008-09-11 23:30 minimum length 2008-09-11 23:30 oh 2008-09-11 23:30 well 2008-09-11 23:30 well selinux is everywhere - and compressed to 4 bytes 2008-09-11 23:30 "significant" metadata compression coming up 2008-09-11 23:30 acl is not everywhere... 2008-09-11 23:30 ah 2008-09-11 23:30 so only 2/1 2008-09-11 23:30 when we compress the 4 bytes to 2 byte atoms 2008-09-11 23:30 in a selinux system everyfile has to have a selinux context 2008-09-11 23:30 but not every file has to have extended acls 2008-09-11 23:31 I see 2008-09-11 23:31 so it depends on how you do the selinux compression 2008-09-11 23:31 how badly you want to compress it 2008-09-11 23:31 look up every xattr body in the atom table 2008-09-11 23:31 easy 2008-09-11 23:31 the most relaxed method uses 16 bytes 2008-09-11 23:31 -!- stargazr5(~gauravstt@59.95.38.255) has joined #tux3 2008-09-11 23:31 the most compressed 2 bytes 2008-09-11 23:32 I see 2008-09-11 23:32 varying levels of extendability 2008-09-11 23:32 future-proofing 2008-09-11 23:32 you could do selinux compression like standard unix priveleges compression 2008-09-11 23:32 yes, but how big do you make the bitfields? 2008-09-11 23:32 i.e. if it's the same as the parent directory, don't store it in the inode 2008-09-11 23:32 so they've already done a pretty good job of compressing bodies 2008-09-11 23:32 it's the xattr labels that will stick out 2008-09-11 23:32 you don't need the xattr labels at all 2008-09-11 23:33 the selinux and acl xattr labels are trivially obvious well known labels, that you fake on xattr access 2008-09-11 23:33 konrad, not there's no way to find the partent directory actually 2008-09-11 23:33 so if we do that, it will be the containing inode table block 2008-09-11 23:33 or region 2008-09-11 23:34 "note there's no way to find the parent directory actually 2008-09-11 23:34 " 2008-09-11 23:34 so you don't need to store them as xattrs anyway 2008-09-11 23:34 hardlinks 2008-09-11 23:34 hard to type and eat ramyun at the same time 2008-09-11 23:34 inode can exist in multiple dirs 2008-09-11 23:35 time to do my atom refcount post 2008-09-11 23:35 I'll post the design, then implement 2008-09-11 23:35 like a good boy should 2008-09-11 23:36 also, note that xattrs 2008-09-11 23:36 basically come in a couple variaties 2008-09-11 23:36 security.* system.* trusted.* user.* 2008-09-11 23:37 so it's worthwhile to compress {security|system|trusted|user} regardless 2008-09-11 23:37 flips: er, sorry, yes, that. 2008-09-11 23:38 security is basically for selinux (& other such systems), system for extended acls (& other such system - capabilities), user for users, trusted (accessible to CAP_SYS_ADMIN) 2008-09-11 23:39 ACTION trashes the latest lucasarts game on slashdot 2008-09-11 23:39 got to keep focussed here 2008-09-11 23:39 what's the game? 2008-09-11 23:39 I just received sport 2008-09-11 23:39 erm, spore 2008-09-11 23:40 spore is on my never play list 2008-09-11 23:40 with the drm 2008-09-11 23:40 even if they're doing it to windows boxes 2008-09-11 23:40 there's drm? 2008-09-11 23:40 hmm, well I have a mac 2008-09-11 23:40 it fscks with the registry 2008-09-11 23:40 for copy protection 2008-09-11 23:41 hacks the os security 2008-09-11 23:41 details are all over the web 2008-09-11 23:41 not that hacking windows security is all that leet... still it's agin the law 2008-09-11 23:42 hmm, all my windoze are safely in kvm cages 2008-09-11 23:42 you having fun with spore? 2008-09-11 23:42 makes for the fastest windows installs ever 2008-09-11 23:43 haven't launched it yet ;-) 2008-09-11 23:43 I had fun with civ revolutions 2008-09-11 23:43 never played a civ game before 2008-09-11 23:43 will need to reboot into mac 2008-09-11 23:43 surprised me, I didn't think I'd like it 2008-09-11 23:43 well 2008-09-11 23:43 post coming, for realz 2008-09-11 23:46 man setxattr: If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP. 2008-09-11 23:53 -!- kd(kdpict@118.94.53.35) has joined #tux3 2008-09-11 23:54 yah 2008-09-11 23:54 just replied 2008-09-12 00:00 as did I 2008-09-12 00:03 good to do this stuff in public 2008-09-12 00:03 keeps a useful record, and it's a subtle way of poking at the libxattr guys to fix their packages 2008-09-12 00:03 ACTION wonders who the libxattr guys are 2008-09-12 00:04 choose one: 1) redhat 2) suse 2008-09-12 00:05 http://oss.sgi.com/projects/xfs/ 2008-09-12 00:05 possible 2008-09-12 00:06 $ rpm -qf `which getfattr ` 2008-09-12 00:06 attr-2.4.41-1.fc9.x86_64 2008-09-12 00:06 it was a suse guy who put xattr+acl support into ext3 2008-09-12 00:06 $ rpm -qi attr | grep URL 2008-09-12 00:06 URL : http://oss.sgi.com/projects/xfs/ 2008-09-12 00:06 ah, ok 2008-09-12 00:06 -!- stargazr5(~gauravstt@59.95.35.250) has joined #tux3 2008-09-12 00:07 well all the slashdotters seem to agree that the new lucasarts game lacks in the gameplay department 2008-09-12 00:07 it's fun though 2008-09-12 00:08 for a few minutes 2008-09-12 00:08 throwing big chunks of metal at little robots 2008-09-12 00:08 making lightning come out of your fingers and not do much 2008-09-12 00:10 what's it called? 2008-09-12 00:11 force unleashed 2008-09-12 00:11 you play as a sith apprentice 2008-09-12 00:11 darth's personal waterboy 2008-09-12 00:14 flips: tried KOTOR? 2008-09-12 00:14 both parts are good 2008-09-12 00:14 enjoyed kotor 2008-09-12 00:15 force unleashed better ? 2008-09-12 00:16 but I don't want to play any more bioware games 2008-09-12 00:16 too cookie cutter 2008-09-12 00:16 even jade empire felt like kotor 2008-09-12 00:16 and I got spoiled by oblivion 2008-09-12 00:16 everything you couldn't do in kotor,you could do in oblivion 2008-09-12 00:16 nope. haven't played either 2008-09-12 00:16 yeah. oblivion was great. 2008-09-12 00:17 force unleashed is mildly entertaining 2008-09-12 00:17 they got some mechanics right, others badly wrong 2008-09-12 00:17 and it's way linear 2008-09-12 00:18 I'll probably get force unleashed 2008-09-12 00:18 but I don't expect much 2008-09-12 00:18 just filling time until bethesda comes up with something new ;-) 2008-09-12 00:18 :) 2008-09-12 00:19 thanks for ur support to the de-duplication idea 2008-09-12 00:19 will get back to you when we have something more concrete 2008-09-12 00:19 welcome, so that's you 2008-09-12 00:20 sure 2008-09-12 00:20 deduplication seems to be a big hot button 2008-09-12 00:20 yeah...me, stargazr5 , kd and another 2008-09-12 00:20 for new fs design 2008-09-12 00:20 ah 2008-09-12 00:20 were you here for the vfs tour today? 2008-09-12 00:21 no....missed it... :( but following it now. 2008-09-12 00:21 good 2008-09-12 00:21 how many years of C has each of you got? 2008-09-12 00:22 look through the logs - there's bound to be some jewels in there 2008-09-12 00:23 about 2 and half years 2008-09-12 00:23 MaZe: thanks. will do. 2008-09-12 00:24 and have you done an OS and/or FS course yet? 2008-09-12 00:24 I think this is for an advanced fs course, right? 2008-09-12 00:24 ACTION reads again 2008-09-12 00:25 yes. OS. 2008-09-12 00:26 how much low level experience do you have? C / assembly interfaces? While it's not really needed, it comes in useful from time to time. 2008-09-12 00:27 [of course, IMHO, assembly is always useful to know... so I may be biased] 2008-09-12 00:27 I agree 2008-09-12 00:27 not as a first language 2008-09-12 00:27 but certainly as one of the first 5 2008-09-12 00:28 it's been years now since I've written any 2008-09-12 00:28 if I do write some, it's likely to be for some strange arch like cell spe 2008-09-12 00:28 oh, I've never written true assembly... it's always been inline, often entire procedures, but never entire programs (unless the entire program was a hundred lines or less) 2008-09-12 00:29 ah, I've written tens of thousands of lines 2008-09-12 00:29 wallowed in it 2008-09-12 00:29 have a bit of experience in assembly...not much.. 2008-09-12 00:29 oh, I've written tens of thousands of lines, never all in one piece though 2008-09-12 00:29 got really good at it, then realized there's people much, much better 2008-09-12 00:30 this is part of project that we can do on any cs topic 2008-09-12 00:30 right 2008-09-12 00:30 The biggest pieces I've written were usually either asm-coded bigint adders and the like, or something like a boot sector 2008-09-12 00:30 just reread your post 2008-09-12 00:30 I've done stuff llike transcoded knuth's algorithms for infinite precision math from MIX and x86 2008-09-12 00:31 neither of which has a lot of code, but either has huge performance boosts from assembly, or just needs to be in asm 2008-09-12 00:31 making the carries work out is hard ;-) 2008-09-12 00:31 MIX to x86 I mean 2008-09-12 00:31 oh, but that sort of stuff is still not pure asm, you write that in C with good macros and inline asm 2008-09-12 00:31 not me 2008-09-12 00:31 pure 2008-09-12 00:31 there is almost never a reason to write pure asm 2008-09-12 00:32 sure, when the OS isn't linux 2008-09-12 00:32 and the compiler isn't gcc 2008-09-12 00:32 gcc asm syntax blows by the way ;-) 2008-09-12 00:32 true, if compiler != gcc, then hang yourself 2008-09-12 00:32 blows as in bad? or in good? 2008-09-12 00:32 bad 2008-09-12 00:32 sucks chunks 2008-09-12 00:32 the syntax is bad, but it is extremely powerful 2008-09-12 00:33 although it takes quite some getting used to 2008-09-12 00:33 I know, so why not have the syntax be good and be extremely powerful? 2008-09-12 00:33 hehe 2008-09-12 00:33 yeah, well... that's like 2008-09-12 00:33 C 2008-09-12 00:33 the syntax for C also blows 2008-09-12 00:33 kinda yes 2008-09-12 00:33 at&t asm blows orders worse 2008-09-12 00:33 I think that's where it came from 2008-09-12 00:34 well 2008-09-12 00:34 let's not scare the visitors ;-) 2008-09-12 00:34 cranky old hacks 2008-09-12 00:35 a (*b(c d, e (*f)(g)))(h); 2008-09-12 00:35 what kind of syntax is that? 2008-09-12 00:35 next tux3 university? 2008-09-12 00:36 tue at 8 pm pacific 2008-09-12 00:36 tuesday 8 pm 2008-09-12 00:36 right, I tend to forget that timezone 2008-09-12 00:36 will be there this time. 2008-09-12 00:36 ok 2008-09-12 00:36 like the world revolves around silly valley 2008-09-12 00:36 :) 2008-09-12 00:36 :) 2008-09-12 00:36 doesn't it? 2008-09-12 00:36 not sure 2008-09-12 00:36 I didn't use to think so ;) 2008-09-12 00:37 I have some stories about silicon valley and time zones that I can't share ;-( 2008-09-12 00:37 nice to know there's a reason to get you drunk 2008-09-12 00:38 speaking of which 2008-09-12 00:38 a sake would help get this post written 2008-09-12 00:38 or should I just hack 2008-09-12 00:38 hmm 2008-09-12 00:38 I mean, with the sake in hand of course 2008-09-12 00:39 [btw, that declaration above, that's valid C - that's the declaration of signal from the std C library, where a,e=void b=signal c,g,h=int d=sig f=func 2008-09-12 00:39 taking abt timezones...its luch time here...i am off.. 2008-09-12 00:40 wow 2008-09-12 00:40 me too 2008-09-12 00:41 see you cdk, stargazr5 2008-09-12 00:42 you know I can read that without thinking? 2008-09-12 00:42 that's scary 2008-09-12 00:42 also used to to hex multiply/divide in my head at the most geekiest 2008-09-12 00:42 still can do it, more slowly 2008-09-12 00:50 the C syntax above? without thinking? really? that is scary 2008-09-12 00:50 that is like the ugliest part of C... 2008-09-12 00:51 arguably 2008-09-12 00:51 personally, I think const is 2008-09-12 00:51 dreamed up by a sadist 2008-09-12 00:51 oh, const is relatively simple to parse though 2008-09-12 00:51 especially if you write it so 'char const *' instead of 'const char *' 2008-09-12 00:51 but a devilishly effective makework project 2008-09-12 00:52 and then read from the back, since C is mostly read from the back/center anyway 2008-09-12 00:52 bottom up reading is a powerful organizing force 2008-09-12 00:52 I so much prefer Pascal syntax for type definitions 2008-09-12 00:52 you just read it left to right 2008-09-12 00:52 me too 2008-09-12 00:53 but pascal as a whole is just plain irritating 2008-09-12 00:53 if the things a pointer it says so in the first character 2008-09-12 00:53 I am also offended by == 2008-09-12 00:53 irritating? a little long-winded, true, but so frickin' easy to understand 2008-09-12 00:53 and friends 2008-09-12 00:53 yes, I prefer the pascal := and = as opposed to = and == 2008-09-12 00:54 I don't mind != nor <> - doesn't really matter to me 2008-09-12 00:54 the most pleasant language I've worked in is pick basic 2008-09-12 00:54 as modified by a friend of mine 2008-09-12 00:54 not aware of that - does that differ from basic? 2008-09-12 00:54 most of the stupidities gone 2008-09-12 00:54 and missing stuffyou need in, like structuring primitives 2008-09-12 00:54 [are you aware modern oo pascal like delphi, like freepascal, is 32bit and has object oriented programming, operator overloading, function overloading, etc...] 2008-09-12 00:55 it's very different from basic 2008-09-12 00:55 totally not anything like msft basic 2008-09-12 00:55 h 2008-09-12 00:55 ehrm, ah 2008-09-12 00:55 I haven't used the latest borland stuff, no 2008-09-12 00:55 msft liked it enough to headhunt the guy as I recall 2008-09-12 00:55 and we got c# :p 2008-09-12 00:56 another flavor of C-that-blows 2008-09-12 00:56 C# is just a flavour of java 2008-09-12 00:56 with the worst of C added in 2008-09-12 00:58 I want a melding of pascal (type declaration syntax, ease of reading code), Java (generics, interfaces [extended]), gnu-ism (inline asm power), C (low level control), C++ (some of the OO, dropping multiple inheritance), not sure what to do with some things [exceptions] 2008-09-12 00:59 let me know when you have code to try 2008-09-12 00:59 make it a very strongly typed language, drop most of the legacy crap, support useful UI candy (type in constants in any base, etc...) 2008-09-12 01:00 don't forget to make it interactive 2008-09-12 01:00 and managed 2008-09-12 01:00 and semicolons optional 2008-09-12 01:00 being able to recompile program blocks and replace them on the fly - yes I've wanted that ;-) 2008-09-12 01:00 managed? 2008-09-12 01:00 likewise parens, including fn call parens 2008-09-12 01:00 a function needs parens 2008-09-12 01:00 managed = can't segfault 2008-09-12 01:00 a procedure doesn't 2008-09-12 01:01 what does that mean? 2008-09-12 01:01 can't segfault? 2008-09-12 01:01 yes. 2008-09-12 01:01 lisp can't segfault in principle 2008-09-12 01:01 java can't either 2008-09-12 01:01 oh, but then it's not low-level 2008-09-12 01:01 I'm really concerned as to why 65% of my memory is in use (not including cache) 2008-09-12 01:01 in principle. 2008-09-12 01:01 konrad, in what context? 2008-09-12 01:01 the above would be a language you could write a kernel in 2008-09-12 01:02 konrad, running mozilla? 2008-09-12 01:02 oh 2008-09-12 01:02 flips: yeah, but that's only eating 600M or something 2008-09-12 01:02 I have 6 gigs 2008-09-12 01:02 I've got 45% in programs on a 4 g machine 2008-09-12 01:02 ncie 2008-09-12 01:02 nice 2008-09-12 01:02 50% cache 2008-09-12 01:02 yeah. I'm concerned. 2008-09-12 01:02 need to get yourself a memory map 2008-09-12 01:02 from proc 2008-09-12 01:02 there must be a tool 2008-09-12 01:03 (and in my case that probably is a gig and a half of firefox3) 2008-09-12 01:03 wicked 2008-09-12 01:03 13457 maze 20 0 103m 12m 9.9m S 0.0 0.3 0:29.40 gnome-power-man 2008-09-12 01:04 103M! 2008-09-12 01:04 3409 root 20 0 1454m 1.1g 28m S 5.6 18.0 531:18.61 Xorg 2008-09-12 01:04 just say gno to gnome 2008-09-12 01:04 but that's still insubstantial relative to 6 2008-09-12 01:05 kmail's using 500M 2008-09-12 01:05 what do you get from cat /proc/meminfo? 2008-09-12 01:05 pastie maybe? 2008-09-12 01:06 http://pastie.caboo.se/271078 2008-09-12 01:10 .6 gig of buffers, woof 2008-09-12 01:11 A gig into swap 2008-09-12 01:11 that's braindamage 2008-09-12 01:11 something is using 2.7 gig of straight memory 2008-09-12 01:11 should not be hard to find 2008-09-12 01:11 well 2008-09-12 01:11 X leaks often 2008-09-12 01:11 if you don't see it in the processes then yes, that is worrisome 2008-09-12 01:12 but you'd see the X usage even when it's leaking 2008-09-12 01:12 you know Shift-M with top? 2008-09-12 01:12 I think that's the one 2008-09-12 01:12 shows your proces in rss order 2008-09-12 01:12 or is it total vm size 2008-09-12 01:12 one of those 2008-09-12 01:12 vm size I think 2008-09-12 01:13 in order, Xorg, firefox, nautilus, gnome-panel, kmail, pidgin, gnome-terminal 2008-09-12 01:13 and some others 2008-09-12 01:13 VmallocTotal: 34359738367 kB 2008-09-12 01:13 that has to be broken 2008-09-12 01:14 VmallocChunk: 34359675895 kB 2008-09-12 01:14 haven't spent much time crawling in vm lately 2008-09-12 01:14 you might want to go onto #mm on this server 2008-09-12 01:15 and complain about that vmalloctotal 2008-09-12 01:15 it's late enough that I can't be arsed 2008-09-12 01:15 works for me 2008-09-12 01:15 alright, I got back a significant chunk of it by ditching firefox 2008-09-12 01:16 down to 51% used 2008-09-12 01:16 still 2008-09-12 01:17 check your anon 2008-09-12 01:17 I don't think firefox was using that 2008-09-12 01:18 600M of it went away after closing firefox 2008-09-12 01:18 leaving 2.1 gig in anon? 2008-09-12 01:18 that's broken 2008-09-12 01:18 yes 2008-09-12 01:19 check top 2008-09-12 01:19 for what? 2008-09-12 01:19 shift-M 2008-09-12 01:19 look at virtual size 2008-09-12 01:19 and rss 2008-09-12 01:19 fes 2008-09-12 01:19 res 2008-09-12 01:20 xorg has 1471m virt 1.1g res, nautilus 767m virt 212m res, kmail 536m virt 146m res 2008-09-12 01:20 top 3 2008-09-12 01:20 fsking pigs 2008-09-12 01:20 wtf is X doing? 2008-09-12 01:21 nautilus... 2008-09-12 01:21 :p 2008-09-12 01:21 shh :) 2008-09-12 01:21 I'd say you' 2008-09-12 01:21 I'd say you've got a simple case of out of control X plus 4 x bloatware 2008-09-12 01:21 sounds about right 2008-09-12 01:22 I'd call the X part a bug 2008-09-12 01:22 the other is just sloth 2008-09-12 01:22 it leaks like crazy 2008-09-12 01:22 write nasty emails to xorg 2008-09-12 01:22 tell them you want your money back 2008-09-12 01:23 kernel is unsing an unconscionable amount of buffers 2008-09-12 01:23 metadata is supposed to be small. Buffers -> metadata 2008-09-12 01:23 you can go complain on #mm 2008-09-12 01:23 tell peterz to do something ;-) 2008-09-12 01:23 that might be a result of encrypted harddrive 2008-09-12 01:24 probably 2008-09-12 01:24 well 2008-09-12 01:24 dodgy encryption layer 2008-09-12 01:24 right 2008-09-12 01:25 what's the encryption method, craptoloop or dm-crapt? 2008-09-12 01:25 dm-crypt 2008-09-12 01:25 complain on the dm-devel list 2008-09-12 01:25 there 2008-09-12 01:25 got it all sorted ;-) 2008-09-12 01:25 :) 2008-09-12 01:49 incremental refcount block update cost <- I love english 2008-09-12 01:49 don't you maze? 2008-09-12 01:51 all that stuff in the refcount post was actually designed during the skate today 2008-09-12 01:51 the same skate the resulting in "rollerbladers are allowed" inthe sk8board park 2008-09-12 01:52 so it was a good skate, all things considered 2008-09-12 01:52 g'night 2008-09-12 01:57 "Meanwhile, Rockbox has performed a valuable service for Debian developers who would otherwise have to struggle to find a project with longer release cycles than their own. " hah 2008-09-12 01:59 :-) 2008-09-12 01:59 what is rockbox? 2008-09-12 02:03 different firmware for your ipod 2008-09-12 02:03 or other 'mp3 player'-class devices 2008-09-12 02:04 (with additional functionality in mind) 2008-09-12 02:09 ACTION talks about b-tree parallelization with flips 2008-09-12 02:09 I was thinking about how something like RCU could be integrated into a b-tree. I don't know the specifics of a b-tree per se other than it's a tree that's flatter and better suited for storage 2008-09-12 02:11 flips: so I was thinking about per inode processing if we decided to parallelize it on that bassis 2008-09-12 02:11 we don't have rcu in userspace 2008-09-12 02:11 but we need locking in userspace 2008-09-12 02:11 also: rcu has some scary artifacts 2008-09-12 02:11 file locking would have to be done on a per inode basis 2008-09-12 02:12 I think it's a case of, do some decent spinlock + mutex work, then try rcu 2008-09-12 02:12 not rcu first 2008-09-12 02:12 rcu is wierd when it goes wrong 2008-09-12 02:12 so that code would to have to be split up in that manner if you do it that way 2008-09-12 02:12 yeah, I know 2008-09-12 02:12 so are spinlocks etc, but the weirdness is a lot easier to grasp 2008-09-12 02:12 you have to validae the data and stuff before using it 2008-09-12 02:12 making sure that it's not stale 2008-09-12 02:12 per inode is too coarse 2008-09-12 02:13 across quiescence periods 2008-09-12 02:13 well, what then ? 2008-09-12 02:13 simple spinlocks and mutexes 2008-09-12 02:13 decide how many 2008-09-12 02:13 what order acquired 2008-09-12 02:13 for how long 2008-09-12 02:13 what granularity of data protected 2008-09-12 02:14 estimate contention 2008-09-12 02:14 decide where rw is appropriate 2008-09-12 02:14 where trylock works 2008-09-12 02:14 then think about per cpu 2008-09-12 02:14 problem here is that rwlocks suck badly since they still depend on an atomic operation, limited scalability which is why some kind of per CPU-ism is useful 2008-09-12 02:14 at first, pretend bouncing costs nothing 2008-09-12 02:15 ok 2008-09-12 02:15 when it's working reliably and bouncing starts to show in the profile (because everything else is so fast) then take anti-bounce measures 2008-09-12 02:15 let think, how do we want to break this up ? 2008-09-12 02:15 what are we protecting and what circumstances ? 2008-09-12 02:16 there should be any number of concurrent readers and writes allowed in one file inode at the same time 2008-09-12 02:16 same with the inode table 2008-09-12 02:16 they will be partitioned by subtree 2008-09-12 02:16 higher levels of the tree can have rwlocks 2008-09-12 02:17 I think 2008-09-12 02:17 low level, simple mutex or spinlock is better 2008-09-12 02:17 what parts of the subtree ? 2008-09-12 02:17 =and how are they realted to the inode itself ? 2008-09-12 02:17 and how are they related to the inode itself ? 2008-09-12 02:17 a file index tree descends from the inode table 2008-09-12 02:18 think 50 petabyte file 2008-09-12 02:18 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-12 02:18 well, it depends on what you're guarding, we have to define the relationship first 2008-09-12 02:18 to get the sense of how many read/writes can be in it at the same time 2008-09-12 02:18 let's start from the beginning 2008-09-12 02:18 what happens on a file open ? 2008-09-12 02:18 guarding changes to the index leaf nodes, which is to say, the block pointers, and later extents 2008-09-12 02:18 and define the common operations 2008-09-12 02:18 open, read, write, close 2008-09-12 02:19 on file open we first look in the directoy file 2008-09-12 02:19 find the inode number 2008-09-12 02:19 then probe into the inode table 2008-09-12 02:19 that's a flat file, right ? 2008-09-12 02:19 find the inode table block, and the inode in it 2008-09-12 02:19 the directory? 2008-09-12 02:19 currently flat 2008-09-12 02:19 diretory file 2008-09-12 02:19 directory file 2008-09-12 02:19 later will have a btree mapped into the flat file 2008-09-12 02:19 has its own locking considerations 2008-09-12 02:19 I'm assuming that's it's a specific inode on the file system 2008-09-12 02:20 what is? 2008-09-12 02:20 we have two structures so far right ? 2008-09-12 02:20 1) directory map file 2008-09-12 02:20 2) b-tree 2008-09-12 02:20 not quite like that 2008-09-12 02:20 tux3 is a two level btree structure 2008-09-12 02:20 top level btree is the inode table 2008-09-12 02:21 from the inode table descend some large number of file index btrees 2008-09-12 02:21 a directory is the leaves of one of those btrees 2008-09-12 02:21 that is, the data blocks 2008-09-12 02:21 the leaves of a file index btree actually contain pointers to data blocks 2008-09-12 02:22 so we go probing around in some directory, taking the same locks as we would for any file 2008-09-12 02:22 ACTION reads 2008-09-12 02:22 that is, locking various levels of the index btree of the directory file 2008-09-12 02:23 once we find a data block we read it into the page cache and drop our locks 2008-09-12 02:23 maybe not all of them, maybe just up to some level 2008-09-12 02:23 well 2008-09-12 02:24 that is a little tricky, because the linux generic_file_read etc functions don't work that way 2008-09-12 02:24 they generally cause the filesystem to walk its index tree over and over again, for each block 2008-09-12 02:24 sucks 2008-09-12 02:24 ACTION is a bit confused 2008-09-12 02:24 ACTION thinks 2008-09-12 02:24 we don't need to worry about that 2008-09-12 02:25 for the moment we only need to be able to dive down into the btree and find a pointer to some data block 2008-09-12 02:25 see inode.c 2008-09-12 02:25 "filemap_blockio" 2008-09-12 02:26 most of the work is done by "probe" 2008-09-12 02:26 probe is where most of the locking action will happen 2008-09-12 02:27 so a file is a b-tree ? 2008-09-12 02:27 which is lower in level to the inode b-tree ? 2008-09-12 02:27 that's the relationship ? correct ? 2008-09-12 02:27 http://tux3.org/tux3?f=6ea2692d2839;file=user/test/btree.c 2008-09-12 02:27 see probe in there 2008-09-12 02:27 a file is _indexed_ by a btree 2008-09-12 02:27 that's in the lower level right ? 2008-09-12 02:28 ACTION looks 2008-09-12 02:28 a file lives in data blocks, that are pointed to by pointers that live in the leaves of a btree, called a data index btree 2008-09-12 02:28 the leavesof that btree are called dleaves 2008-09-12 02:28 see dleaf.c 2008-09-12 02:29 the situation with dleaf.c is pretty simple 2008-09-12 02:29 we can protext an entire dleaf as one logical entitity 2008-09-12 02:30 that covers about 500 file data blocks 2008-09-12 02:30 which is an ok granularity 2008-09-12 02:30 top level b-tree right ? 2008-09-12 02:30 what top level btree? 2008-09-12 02:30 a dtree is a second level btree 2008-09-12 02:30 you have an inode b-tree and a data index b-tree, correct ? 2008-09-12 02:30 the top level btree is the inode table 2008-09-12 02:30 right 2008-09-12 02:30 I'm just trying to understand the terminology here 2008-09-12 02:30 ok, good 2008-09-12 02:30 that's what I though 2008-09-12 02:30 thought 2008-09-12 02:30 itree vs dtree 2008-09-12 02:31 good 2008-09-12 02:31 yeah, good terminology 2008-09-12 02:31 thanks 2008-09-12 02:31 itree->dtree 2008-09-12 02:31 right 2008-09-12 02:31 terminology is important 2008-09-12 02:31 agreed 2008-09-12 02:31 it's what I figured you said in the first place, but I had to be sure 2008-09-12 02:31 right, protect a dtree entirely with a lock 2008-09-12 02:31 http://kerneltrap.org/Linux/Tux3_Hierarchical_Structure 2008-09-12 02:32 some of this is wrong now 2008-09-12 02:32 ACTION reads 2008-09-12 02:32 dropped the volume table, moved the free map inside the itree 2008-09-12 02:32 update it before the next tux3 university 2008-09-12 02:32 as a normal file 2008-09-12 02:32 that's hard 2008-09-12 02:32 that's on somebody else's site 2008-09-12 02:32 but I can post something on tux3.org 2008-09-12 02:32 ACTION really appreciates the help in learning this from flips 2008-09-12 02:33 do you have an allocation maps that's shared 2008-09-12 02:33 ? 2008-09-12 02:33 at least you can see the inode table / data index table relationship there 2008-09-12 02:33 but it is obscured by the volume table, which I determined to be useless 2008-09-12 02:33 that's a potentially huge problem for contention with regards to the allocator 2008-09-12 02:33 allocation map? 2008-09-12 02:34 there is an allocation bitmpa 2008-09-12 02:34 block allocation map 2008-09-12 02:34 which is a normal file 2008-09-12 02:34 well, how do you modify it, say, under heavy delete or data creation pressure ? 2008-09-12 02:34 sb->bitmap in inode.c 2008-09-12 02:34 doesn't it need a lock around it ? 2008-09-12 02:34 currently there is no locking 2008-09-12 02:34 or concurrency 2008-09-12 02:34 soon 2008-09-12 02:35 well, doesn't it need it ? 2008-09-12 02:35 but it is just a normal file 2008-09-12 02:35 lock it with the same granularity 2008-09-12 02:35 there's a lot of activity there so I expect it to be heavily hit 2008-09-12 02:35 sure 2008-09-12 02:35 same granularity as what ? 2008-09-12 02:35 other files too 2008-09-12 02:35 but! 2008-09-12 02:35 there is a difference with tux3 2008-09-12 02:35 ok 2008-09-12 02:35 tux3 has this way of logging changes to the bitmaps 2008-09-12 02:35 it doesn't have to lock, write block, wait 2008-09-12 02:35 ok 2008-09-12 02:35 that kind of thing 2008-09-12 02:35 oh nice 2008-09-12 02:36 so locks on the bitmap are just page cache locks 2008-09-12 02:36 deltas to the allocation map are just appended 2008-09-12 02:36 that is, most like actually locking pages when we get to kernel 2008-09-12 02:36 or we could lock buffers 2008-09-12 02:36 locking pages is a little faster 2008-09-12 02:36 what about during concurrent access against an online checker that needs to know about all of the appended logs ? 2008-09-12 02:36 yes, deltas to the allocation map are just logged 2008-09-12 02:37 and every now and then we pour a bunch of them into the allocation map and write it out 2008-09-12 02:37 differ those checks until the log has been commit to the disk and then restart it ? 2008-09-12 02:37 committed 2008-09-12 02:37 the allocation map always has the most recent version of the allcoation 2008-09-12 02:37 in buffers 2008-09-12 02:37 in memory 2008-09-12 02:37 because, say, you want to verify if data blocks that some indirect mapping is pointing is allocated or not 2008-09-12 02:37 so an online check, ah, needs to check the disk image 2008-09-12 02:37 not the cached image 2008-09-12 02:37 right? 2008-09-12 02:38 pretty hard to do otheriwse 2008-09-12 02:38 anway, that's not the immediate problem 2008-09-12 02:38 the immediate problem is just tohave fast, concurrent access to everything 2008-09-12 02:38 if it's not and a log is being committed, we should delay it until that log has been committed ? 2008-09-12 02:38 just thinking out loud 2008-09-12 02:38 if what is not? 2008-09-12 02:38 what ? the log ? 2008-09-12 02:39 the log itself 2008-09-12 02:39 "if it's not"you said 2008-09-12 02:39 don't know what "it" is 2008-09-12 02:39 is there a scenario where the online checking of that portion of the disk and ...on that'll never happen 2008-09-12 02:39 because of the atomic commit 2008-09-12 02:39 it's should be consistent at that point from previous commits 2008-09-12 02:40 we don't really need to check logs that are being committed to disk and wait for them to complete 2008-09-12 02:40 or do we ? 2008-09-12 02:40 we do 2008-09-12 02:40 because the logs form a promise of what the "real" disk image "should" look like 2008-09-12 02:40 yeah, well then we have to lock them down or something like that 2008-09-12 02:40 so we need to take it into account during checking 2008-09-12 02:40 but checking is far in the future 2008-09-12 02:41 under, say a rwlock lock, reader side 2008-09-12 02:41 at least 3 months 2008-09-12 02:41 probably 4 2008-09-12 02:41 ok 2008-09-12 02:41 worth thinking about 2008-09-12 02:41 let's do beer on it 2008-09-12 02:41 I'll think about it on my next skate, if refcounting is done ;-) 2008-09-12 02:42 going back to the bitmap 2008-09-12 02:42 so, at least each bit has to be protected 2008-09-12 02:42 we do scan, find, change 2008-09-12 02:42 and that scan/find/change to allocate a block has to be under a spinlock 2008-09-12 02:43 flips: am I being helpful or not ? 2008-09-12 02:43 or in userspace, under a pthread mutex 2008-09-12 02:43 or saying stupid irrelevant things ? just checking 2008-09-12 02:43 of course 2008-09-12 02:43 ok 2008-09-12 02:43 I haven't been required to be precise about this up till now 2008-09-12 02:43 or deal with somebody who had written nontrivial locking 2008-09-12 02:43 we'll I hope I'm helping 2008-09-12 02:43 yep 2008-09-12 02:43 anyway, the allocation bitmap is a good place to start 2008-09-12 02:44 because there is a pretty simple situation there 2008-09-12 02:44 I think you can definitely isolate a dtree using an individual dtree lock 2008-09-12 02:44 once you know your bitmap block isn't going away 2008-09-12 02:44 well, one lock per dtree is way too crude 2008-09-12 02:44 that's good, but you have to think about the upward relationship between than than itree 2008-09-12 02:44 actually you don't 2008-09-12 02:44 you can treat it as individual blocks 2008-09-12 02:44 is the lock against the dtree sufficient to protect the link in the itree pointing to it ? 2008-09-12 02:45 stuff like that 2008-09-12 02:45 you lock your way down through the btree until you get to the datablock, lock the data block and let everything else go 2008-09-12 02:45 ok, let's define what a read would look like through that. 2008-09-12 02:45 ok, right 2008-09-12 02:45 you look up the inode in the itree 2008-09-12 02:45 you get it 2008-09-12 02:45 what next ? 2008-09-12 02:45 it points to a dtree 2008-09-12 02:45 well that's a good point 2008-09-12 02:45 so you want to delete a data block 2008-09-12 02:45 that means you have to lock the data block 2008-09-12 02:46 so I think that's clear, right? 2008-09-12 02:46 yes 2008-09-12 02:46 that means, the read has to be off it 2008-09-12 02:46 reader 2008-09-12 02:46 what do you lock then ? the dtree or some part of the dtree ? 2008-09-12 02:46 you lock the block 2008-09-12 02:46 how would region locking look like ? 2008-09-12 02:46 region locking looks like locking a subtree node 2008-09-12 02:46 then you have to wait for _every_ other lock to go away 2008-09-12 02:47 not a good idea 2008-09-12 02:47 why do you want to lock a region? 2008-09-12 02:48 posix semantics or something like that 2008-09-12 02:48 totally different locking level 2008-09-12 02:48 can't you lock a range in the file under posix ? 2008-09-12 02:48 waaay up in the vfs 2008-09-12 02:48 layered, independent 2008-09-12 02:49 also, the linux posix locking code blows 2008-09-12 02:49 coarse grained as hell 2008-09-12 02:49 true, but you can bypass it 2008-09-12 02:49 single fucking lock 2008-09-12 02:49 yup, and a linear list 2008-09-12 02:49 blows 2008-09-12 02:49 but you still do it at the same level 2008-09-12 02:49 that's not our concern now 2008-09-12 02:50 or maybe I'm just stuck in the wrong mindset 2008-09-12 02:50 could be 2008-09-12 02:50 anyway, at least we can let that suck exaclty as it always has 2008-09-12 02:50 we won't lose a benchmark showdown for that reason 2008-09-12 02:50 who uses posix locks anyway? ;) 2008-09-12 02:52 there is a case where you want to lock a region 2008-09-12 02:52 cluster fs 2008-09-12 02:52 but that's not us 2008-09-12 02:52 yet 2008-09-12 02:53 ok 2008-09-12 02:53 let's continue 2008-09-12 02:53 how do you lock the data block ? 2008-09-12 02:53 I guess for tux3 we can think of a single block as our unit of locking 2008-09-12 02:53 this is for a read remember... 2008-09-12 02:53 in kernel? 2008-09-12 02:53 take the block lock 2008-09-12 02:53 it's a bitspin lock as I recall 2008-09-12 02:54 same with the page lock 2008-09-12 02:54 it's fast enough for this purpose 2008-09-12 02:54 we'll its all something to think about 2008-09-12 02:54 in userspace 2008-09-12 02:54 pthread mutex, we will put one in each buffer 2008-09-12 02:54 that's pretty nasty 2008-09-12 02:54 so you lock the mutex in the buffer 2008-09-12 02:54 because? 2008-09-12 02:54 how big of a file chunk are we deleting ? 2008-09-12 02:55 ah, delete 2008-09-12 02:55 well, nice thing about truncate is, we don't have to wait for it 2008-09-12 02:55 we can just mark the inode as "truncated" and we're done 2008-09-12 02:56 we don't even have to update the inode 2008-09-12 02:56 just promise to in our log 2008-09-12 02:56 or do we need to lock during the read as well ? 2008-09-12 02:56 and take our sweek time, walking through the dtree, taking locks, freeing blocks 2008-09-12 02:56 on a block basis 2008-09-12 02:56 we need to lock on read, yes 2008-09-12 02:56 on a block basis 2008-09-12 02:56 which ? what does the lock hierarchy look like 2008-09-12 02:56 ? 2008-09-12 02:57 just long enough to enter the block into the cache 2008-09-12 02:57 do we lock the itree ? dtree ? what ? 2008-09-12 02:57 we work our way down the levels of the two trees, taking locks and releasing them 2008-09-12 02:57 we only hold a lock long enough to know that we can see the next object in cache 2008-09-12 02:57 do we take reader locks along the way ? 2008-09-12 02:57 if we don't see it in cache, drop everything, read it in, start over fromthe top 2008-09-12 02:58 simple mined algorithm 2008-09-12 02:58 a starting point 2008-09-12 02:58 and hold them across the entire operation ? 2008-09-12 02:58 let's definte this 2008-09-12 02:58 only across the operation of finding the next level down in the cache 2008-09-12 02:58 soon as we find that, we lock it, release the parent 2008-09-12 02:58 be specific 2008-09-12 02:58 make sense? 2008-09-12 02:58 I thought that was specific 2008-09-12 02:59 hold the itree lock until we get the specific dtree ? 2008-09-12 02:59 there is no itree lock 2008-09-12 02:59 no, more specific :) 2008-09-12 02:59 ok, lock the root of the itree 2008-09-12 02:59 ok 2008-09-12 02:59 that is, look for it in cache 2008-09-12 02:59 ok 2008-09-12 02:59 if it's not there, issue a read 2008-09-12 02:59 block until it is 2008-09-12 02:59 to load the portion of the itree, right ? 2008-09-12 02:59 then block until we own the read lock 2008-09-12 02:59 have a read lock 2008-09-12 03:00 nope 2008-09-12 03:00 wait... 2008-09-12 03:00 to probe down where we want to go 2008-09-12 03:00 let me summarize 2008-09-12 03:00 yes, that stops everybody 2008-09-12 03:00 probing the itree 2008-09-12 03:00 well 2008-09-12 03:00 it stops writers 2008-09-12 03:00 yes 2008-09-12 03:00 because we have a read lock on the root 2008-09-12 03:00 right 2008-09-12 03:00 we aren't going to keep it long 2008-09-12 03:00 that would be unfriendly 2008-09-12 03:00 yes 2008-09-12 03:01 so what we do is, we find the next index block down inthe inode table index tree 2008-09-12 03:01 check its in cache 2008-09-12 03:01 if so, take a read lock on it 2008-09-12 03:01 if not, drop the root lock, issue a read, block on it, start again at the root 2008-09-12 03:01 obviously this may never terminate ;-) 2008-09-12 03:02 but we have other problems if it doesn't 2008-09-12 03:02 so we look up a inode; reader lock the itree; if it's not there issue a read to load that in, release all of the above locks until that block's wait queue wakes; read that block, get that dtree link 2008-09-12 03:02 while holding the itree reader lock 2008-09-12 03:02 we don't read lock the itree 2008-09-12 03:02 correct ? 2008-09-12 03:02 we read lock the root of the itree 2008-09-12 03:02 big difference 2008-09-12 03:02 ok 2008-09-12 03:02 so let's say the itree has seven levels of index 2008-09-12 03:02 big itree 2008-09-12 03:02 what's the difference ? 2008-09-12 03:02 ok 2008-09-12 03:03 we start by locking the root 2008-09-12 03:03 then we lock level one index, and drop the root lock 2008-09-12 03:03 then lock level 2 index block, and drop the level 1 2008-09-12 03:03 and so on 2008-09-12 03:03 down to level 7 2008-09-12 03:03 then we start the same process onthe dtree 2008-09-12 03:03 make sense? 2008-09-12 03:03 or propagate downwards, releasing a lock 2008-09-12 03:03 kind of scary 2008-09-12 03:03 ok, it's not that scary 2008-09-12 03:04 big reason: we wil normally keep hitting the same inode table block several times 2008-09-12 03:04 so we keep a "cursor" 2008-09-12 03:04 right, advance the cursor as needed 2008-09-12 03:04 what about rebalancing operations ? how does this effect it ? 2008-09-12 03:04 got to worry about how cursors interact with write locks on the itree 2008-09-12 03:04 but then 2008-09-12 03:04 that's why we're talking about it 2008-09-12 03:05 I don't know how to manipulate it other than with a big coarse grained lock as this time 2008-09-12 03:05 ok when you want to rebalance, delete, insert, split, whatever, you need a write lock 2008-09-12 03:05 on the parent and on the blocks being changed 2008-09-12 03:05 so you do the same thing 2008-09-12 03:05 I simply don't know enough about b-trees to know how to downward propagate the lock 2008-09-12 03:05 cursor 2008-09-12 03:05 me neither 2008-09-12 03:05 haven't done this before 2008-09-12 03:06 it's jsut brainwork though 2008-09-12 03:06 no magic 2008-09-12 03:06 ok, that's a big deal 2008-09-12 03:06 the expert on tree locking that I know of is peterz 2008-09-12 03:06 what kind of tree? 2008-09-12 03:06 he's done all sorts of shit 2008-09-12 03:06 radix tree and other things 2008-09-12 03:06 I'll ping him 2008-09-12 03:06 higly concurrent trees 2008-09-12 03:06 radix tree is pretty simple 2008-09-12 03:06 highly concurrent trees 2008-09-12 03:06 compared to a filesystem index 2008-09-12 03:07 he's the best person for the job that I know of 2008-09-12 03:07 yes, I'll point peterz at it 2008-09-12 03:07 you're not bad 2008-09-12 03:07 you're asking the right questions 2008-09-12 03:07 he might not have time, but I don't think that your current track of fine graining the system upfront is the best solution 2008-09-12 03:07 ah 2008-09-12 03:08 we have another knob we can tweak 2008-09-12 03:08 you should consider seriously per cpu-ing it if possible or faking it userspace 2008-09-12 03:08 there is also a refcount on each buffer 2008-09-12 03:08 we can get highly concurrent reads with RCU, that's a given 2008-09-12 03:08 per-cpuing it before figuring out how to do it with normal locks would not be wise 2008-09-12 03:08 1) walk 2) run 2008-09-12 03:08 it's just a matter of how we can modify it to apply to your current atomic log at that time 2008-09-12 03:09 ok, see that recount comment 2008-09-12 03:09 maybe the use of an atomic counter would help to version the logs for both RCU tree nodes and the atomic disk log 2008-09-12 03:09 very important 2008-09-12 03:09 forget rcu 2008-09-12 03:09 rcu is braindamage 2008-09-12 03:10 when we want it rcu'd, we'll hand it to the rcu guys 2008-09-12 03:10 it can be but it's also a bad ass algorithm 2008-09-12 03:10 it's a real use of time 2008-09-12 03:10 depends on the kind of guarantees you need 2008-09-12 03:10 not consistent with getting a solid prototype up 2008-09-12 03:10 ok 2008-09-12 03:10 I guess basic thread safety is first 2008-09-12 03:10 anyway, the question, what happens when the itree geometry needs to change 2008-09-12 03:11 so we have all these readers walking down the tree, that's great 2008-09-12 03:11 and they release their locks as they go so somebody can come behind and maybe go down a different subtree 2008-09-12 03:11 very nice already 2008-09-12 03:11 but 2008-09-12 03:11 how do you change the tree geometry? 2008-09-12 03:11 well 2008-09-12 03:12 you can actuall do it when there are readers buzzing away inside subtrees that you're moving around 2008-09-12 03:12 that is cool 2008-09-12 03:12 you just need to write lock the parent and read lock the children, so you know that the tasks ahead of you have gotten off the parent 2008-09-12 03:13 then you can change the parent 2008-09-12 03:13 make sense? 2008-09-12 03:13 you can for example, split the parent 2008-09-12 03:13 and then you may have to change the parent's parent 2008-09-12 03:13 well 2008-09-12 03:13 fun 2008-09-12 03:14 ACTION reads 2008-09-12 03:14 you might have to check the path to find out how high up the splits will go, and get write locks on that whole chain 2008-09-12 03:14 I was just talking to peterz 2008-09-12 03:14 asked him the same questions we were talking about here 2008-09-12 03:14 he's not going to be around much since he's headed to plumber's 2008-09-12 03:14 the comparison of answers must be fascinating 2008-09-12 03:15 I think that itree node deletion needs to be tied to file handle semantics somehow 2008-09-12 03:15 tell him good luck with the plumbing, there is a lot of shit in those pipes 2008-09-12 03:15 na, forget that 2008-09-12 03:15 good 2008-09-12 03:15 it's way wrong ;) 2008-09-12 03:16 peterz used a lock in a linked list node to protect link modification 2008-09-12 03:16 so think about why your deleting an itree node 2008-09-12 03:16 yeah, you're completely removing it 2008-09-12 03:16 but why? 2008-09-12 03:16 it's not really what you want 2008-09-12 03:17 I'm agreening with you 2008-09-12 03:17 it's because you're coalescing the itree, and why are you doing that? 2008-09-12 03:17 agreeing 2008-09-12 03:17 I know 2008-09-12 03:17 I'm doing rhetoric 2008-09-12 03:17 ok 2008-09-12 03:17 so you're doing that because you've just delete masses of files and you want to tighten up the inode table tree a little 2008-09-12 03:17 actually, this is quite optional 2008-09-12 03:18 we don't really need to do that 2008-09-12 03:18 particularly if we tend to reuse the same inode numbers in the not too distant future 2008-09-12 03:18 it's only if we are determined to use completely different ones, for no good reason, that we need to fiddle the geometry of the itree on delete 2008-09-12 03:19 anway 2008-09-12 03:19 let's assume that we do want to be tidy and coalesce the itree frequently, even if we are not required to 2008-09-12 03:20 that means merging nodes in general 2008-09-12 03:20 just what I was talking about 2008-09-12 03:20 well 2008-09-12 03:20 no, I was thinking of splitting above 2008-09-12 03:20 merging is more forgiving as far as locking the access path goes 2008-09-12 03:20 tough problem 2008-09-12 03:21 I'd go with something simple first 2008-09-12 03:21 not really, you only need to write lock the parent and the two blocks being merged 2008-09-12 03:21 this is too much for a prototype 2008-09-12 03:21 but the first thing to do is to define specifically how a coarse grained set of locks would work on it 2008-09-12 03:21 it is traditional to start with a single global lock on each btree 2008-09-12 03:21 and then propagate downwards 2008-09-12 03:21 and find out how badly that sucks 2008-09-12 03:21 right 2008-09-12 03:21 we know in advance it sucks too much 2008-09-12 03:21 so why bother? 2008-09-12 03:21 we have lockstat so we can get a good idea of how sucky it is and it will be sucky 2008-09-12 03:21 maybe in next week's prototype 2008-09-12 03:21 then that's it 2008-09-12 03:22 ok 2008-09-12 03:22 the biggest focus at this time is to get your prototype working fully 2008-09-12 03:22 true 2008-09-12 03:22 with concurrency 2008-09-12 03:22 in user space 2008-09-12 03:22 had a slight change of philosophy there 2008-09-12 03:22 the best thing we can do is make provisions to do fine grained locking or per cpu-ification in the future easily 2008-09-12 03:22 when the fuse stuff landed 2008-09-12 03:22 not solve the entire problem upfront 2008-09-12 03:22 right 2008-09-12 03:22 well 2008-09-12 03:23 I don't think you mean by per-cpu what I mean 2008-09-12 03:23 but let's ask interesting questions and get help from folks like peterz 2008-09-12 03:23 per-cpu to me means replicating the relevant date per-cpu 2008-09-12 03:23 that's a big mess 2008-09-12 03:23 last resort 2008-09-12 03:23 all that 2008-09-12 03:24 no, 2008-09-12 03:24 it's about avoiding locking in the first place during an inode operation 2008-09-12 03:24 like how? 2008-09-12 03:24 as much locking as possible under that operation 2008-09-12 03:25 like making the entire read path as per cpu as possible 2008-09-12 03:25 vfs is going to take i_sem, can't tell it not to 2008-09-12 03:25 we have to stick to the part we own 2008-09-12 03:25 yeah 2008-09-12 03:25 and are responsible for 2008-09-12 03:25 which is our indexing structures 2008-09-12 03:25 so we're already taking an inode lock of some sort 2008-09-12 03:25 different inode 2008-09-12 03:26 there's the struct inode, which the vfs locks, and the image of the inode on an inode table block, which we lock 2008-09-12 03:41 sleeping time 2008-09-12 03:46 ok 2008-09-12 03:46 night 2008-09-12 03:46 later flips sleep well 2008-09-12 03:46 ACTION is going to be up still 2008-09-12 03:47 night 2008-09-12 08:44 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-12 09:10 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-12 09:26 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-12 10:24 -!- kd(kdpict@118.94.54.179) has joined #tux3 2008-09-12 11:15 -!- kushal(kdpict@118.94.54.179) has joined #tux3 2008-09-12 11:39 -!- pgquiles(~pgquiles@229.Red-83-49-101.dynamicIP.rima-tde.net) has joined #tux3 2008-09-12 13:05 flips: 2008-09-12 13:05 05:09 < peterz> bh: not too hard - I send you a paper on that iirc 2008-09-12 13:05 05:09 < giel> not too hard implementation-wise, or time/space wise? 2008-09-12 13:05 05:10 < giel> complexity theory! 2008-09-12 13:05 05:10 < peterz> implementation wise :-) 2008-09-12 13:05 05:10 < peterz> the btree space/time considerations don't change 2008-09-12 13:05 05:11 < peterz> the thing that's hardest about the fine grain locking is the optimistic locking approach 2008-09-12 13:05 05:11 < peterz> you'd have to work out where upwards traversal stops on your way down 2008-09-12 13:05 which channel? 2008-09-12 13:05 he's exactly right 2008-09-12 13:06 woke up thinking about precisely that 2008-09-12 13:06 this morning 2008-09-12 13:10 #offtopic2 2008-09-12 13:10 but he's traveling right now 2008-09-12 13:11 to KS and Plumbers 2008-09-12 13:11 it's not offtopic ;-) 2008-09-12 13:11 I'll invite peterz here 2008-09-12 13:11 better than #offtopic 2008-09-12 13:12 KS is going to be buzzing about tux3 ;-) 2008-09-12 13:12 lots of trash talking from the trash talkers 2008-09-12 13:13 KS has degenerated to mostly wanking 2008-09-12 13:13 very little tech gets done there any more 2008-09-12 13:13 just climbers getting fact time 2008-09-12 13:13 are they ? 2008-09-12 13:13 regarding buzz ? 2008-09-12 13:13 ? 2008-09-12 13:14 face time I mean 2008-09-12 13:14 oh are they a bunch of wankers now ? this is a publically logged channel keep in mind :) 2008-09-12 13:14 ah right 2008-09-12 13:14 well that's a public comment 2008-09-12 13:14 ACTION giggles 2008-09-12 13:14 not trying to make friends, eh ? :) 2008-09-12 13:14 never have gotten along well with wankers 2008-09-12 13:15 just me 2008-09-12 13:15 yeah, well, I can do with less political wanking and more changes into the kernel 2008-09-12 13:15 yep 2008-09-12 13:15 opensolaris helps focus on that 2008-09-12 13:15 not enough yet 2008-09-12 13:16 specificallly a couple of things I've been planning for years but never had the time to really do 2008-09-12 13:16 linux is losing "customers" to opensolaris 2008-09-12 13:16 it's a fact 2008-09-12 13:16 oh really ? 2008-09-12 13:16 not desktoppers, but datacenters 2008-09-12 13:16 backrooms 2008-09-12 13:16 the guys with money 2008-09-12 13:17 flips: have peterz resend you the paper, I don't know where it is for at the moment 2008-09-12 13:17 paper? 2008-09-12 13:18 the paper on the topic regarding trees and locking 2008-09-12 13:18 would be nice 2008-09-12 13:18 ask him for it 2008-09-12 13:18 sure 2008-09-12 13:19 we have to do some minor changes to btree.c I think 2008-09-12 13:19 because it currently climbs the path when it has to split 2008-09-12 13:19 it has to descend instead, and it has to drop locks before doing that 2008-09-12 13:20 so it may find that somebody else has changed the object is was looking at when it gets back down 2008-09-12 13:20 the object can't be deleted fortunately 2008-09-12 13:20 because its caller must hold a reference 2008-09-12 13:20 it'll have to be locked in-roder 2008-09-12 13:20 in-order 2008-09-12 13:20 on the cache image 2008-09-12 13:20 what ever that is 2008-09-12 13:20 so that is the rule: you have to hold a ref on the cached object before you can delete the disk object 2008-09-12 13:21 and the ref count of the cached object must be equal to one 2008-09-12 13:21 good example of unwritten lore about the kernel 2008-09-12 13:21 books don't tell you that 2008-09-12 13:21 but anybody who is allowed to touch core vfs knows that 2008-09-12 13:21 fs hackers have to know it do, and often don't 2008-09-12 13:21 know it too that is 2008-09-12 13:22 locking order for a btree is simple 2008-09-12 13:22 root-to-leaf 2008-09-12 13:22 left-to-right if that granularity matters, which it doesn't 2008-09-12 13:22 so just root-to-leaf 2008-09-12 13:23 but resize_btree goes leaf-to-root, doesn't work 2008-09-12 13:31 ok, time to stop cleaning up and write some refcounting code 2008-09-12 13:31 no comments back on my post last night 2008-09-12 13:31 I thought folks would chew on that 2008-09-12 13:32 it's really core to tux3 performance in general 2008-09-12 13:32 not just atom refcounting 2008-09-12 13:33 hi. i just copied and pasted the irclogs from the tux3 university sessions. As I haven't seen them on the mailing list as of yet, should I post them? 2008-09-12 13:33 sure 2008-09-12 13:34 complete with all the swearing ;-) 2008-09-12 13:34 this is real life university 2008-09-12 13:34 i'll make sure of it :) 2008-09-12 13:34 might replace some of the bad words with @%$@# 2008-09-12 13:34 or not 2008-09-12 13:34 guess not :) 2008-09-12 13:34 whatever you think is right ;) 2008-09-12 13:34 nice nick 2008-09-12 13:35 well, it was datapunk when I was like 15 or so 2008-09-12 13:35 also good 2008-09-12 13:35 and as they tend to get shorte it now resembles a trekkie name 2008-09-12 13:35 but thanks 2008-09-12 13:35 you have piercings? or just virtual piercings? 2008-09-12 13:35 just virtual 2008-09-12 13:36 :) 2008-09-12 13:36 some of my best friends in berlin had some interesting piercings 2008-09-12 13:36 but for example, harald avoids it 2008-09-12 13:36 works better in the boardroom 2008-09-12 13:36 I was going to look at the reasons for it, but is the problem with deleting files known? 2008-09-12 13:37 frist I heard of it 2008-09-12 13:37 go ahead on it 2008-09-12 13:37 well, i don't particularly like them 2008-09-12 13:37 I wasn't very careful when I put that in 2008-09-12 13:37 ok, will do, after a little algebra session 2008-09-12 13:37 see you later 2008-09-12 13:37 wo wohnst du? 2008-09-12 13:37 karlsruhe 2008-09-12 13:37 if you know it 2008-09-12 13:37 ah cool 2008-09-12 13:37 near SAS 2008-09-12 13:37 sure, been there a few times 2008-09-12 13:38 quite 2008-09-12 13:38 quiet 2008-09-12 13:38 just like the name 2008-09-12 13:38 lots of geeks in the area 2008-09-12 13:38 yep, they are 2008-09-12 13:38 CS is pretty strong 2008-09-12 13:38 suse not far away 2008-09-12 13:38 ibm 2008-09-12 13:38 not sas 2008-09-12 13:38 um 2008-09-12 13:38 um 2008-09-12 13:38 sap? 2008-09-12 13:38 right 2008-09-12 13:38 where I've been too 2008-09-12 13:39 there's a great guy there 2008-09-12 13:39 gotten around a lot? 2008-09-12 13:39 drei jahre in Deutscheland 2008-09-12 13:39 um, 6 jahre 2008-09-12 13:40 that would be 6 Jahre 2008-09-12 13:40 getting rusty 2008-09-12 13:40 for work i guess? 2008-09-12 13:40 and fun 2008-09-12 13:40 well, i've only been to the usa for 11 months 2008-09-12 13:40 and that was for school... and fun 2008-09-12 13:41 that's about enough to be honest 2008-09-12 13:41 berlin is a lot more fun 2008-09-12 13:41 and less tense 2008-09-12 13:41 contrary to popular belief 2008-09-12 13:41 only been there a few times, mostly during the ccc congresses 2008-09-12 13:41 but it certainly is fun 2008-09-12 13:41 geek hotbed 2008-09-12 13:42 ok gotta go, will be back in an hour or so 2008-09-12 13:42 hottest hotbed in europe imho 2008-09-12 13:42 bis spater dann 2008-09-12 13:42 und zu weit weg :) 2008-09-12 13:42 bis dann 2008-09-12 13:42 ACTION is getting really rusty ;) 2008-09-12 13:44 I just had a thought 2008-09-12 13:44 we should schedule an official tux3 cabal meeting for Oct 31 2008-09-12 13:44 on irc, plus a real location in LA 2008-09-12 13:47 might be a good time 2008-09-12 14:12 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-12 14:12 hi tim_dimm 2008-09-12 14:12 coming up for air? 2008-09-12 14:12 hi flips 2008-09-12 14:12 trying to 2008-09-12 14:12 manage a quick skate today? 2008-09-12 14:13 still in sacramento 2008-09-12 14:13 oh right 2008-09-12 14:13 Pi and Persey are still in the ICU 2008-09-12 14:13 and it would be inadvisable anyway 2008-09-12 14:13 how's that? 2008-09-12 14:13 you got a week with no weeks under you to look forward to 2008-09-12 14:13 heh 2008-09-12 14:13 can't justify skating, unless it is to the nursery 2008-09-12 14:14 with no wheels under you I meant 2008-09-12 14:14 I can justify it if the heart rate is up enough 2008-09-12 14:14 getting bad typoitis here 2008-09-12 14:14 full word typos now 2008-09-12 14:14 happens when the volume of code goes stratospheric 2008-09-12 14:14 just read through the first of two tux3 U sessions 2008-09-12 14:14 it hung together pretty well 2008-09-12 14:14 going to now 2008-09-12 14:15 not too much pure bs 2008-09-12 14:15 some content 2008-09-12 14:15 what time of day did you start? 2008-09-12 14:15 today? 2008-09-12 14:15 or last night? 2008-09-12 14:15 8 pm tue and thur 2008-09-12 14:15 no, for the U 2008-09-12 14:15 k 2008-09-12 14:15 will be regular 2008-09-12 14:15 as far as I can manage 2008-09-12 14:15 sounds like a great tool for building community 2008-09-12 14:15 we had one inner linux guru here thursday 2008-09-12 14:16 eric biederman 2008-09-12 14:16 linux cluster guy 2008-09-12 14:16 the linux cluster guy 2008-09-12 14:16 and of course natalie is an inner linux gal 2008-09-12 14:16 googling... 2008-09-12 14:16 you'll get a few hits ;-) 2008-09-12 14:17 you get that email this am about SE Linux? 2008-09-12 14:17 223K to be exact 2008-09-12 14:18 no 2008-09-12 14:18 let me check 2008-09-12 14:18 oh 2008-09-12 14:18 yes 2008-09-12 14:19 right direction? 2008-09-12 14:19 knew about apparmor, suse's better answer to selinux 2008-09-12 14:19 uses the same kernel hooks 2008-09-12 14:19 yes 2008-09-12 14:19 I'm not sure how much apparmor is being worked on right now 2008-09-12 14:19 it's another one of those good projects that gets beaten up by something sloppier but more devs 2008-09-12 14:20 apparently, Novell canned all the engineers working on it in '07 2008-09-12 14:20 bh...Got any intel on apparmor? 2008-09-12 14:20 ? 2008-09-12 14:20 ah 2008-09-12 14:20 ACTION pokes bh 2008-09-12 14:21 heh 2008-09-12 14:21 be nice to know what happened there 2008-09-12 14:21 see who's maintaining it even 2008-09-12 14:21 somebody always maintains os projects 2008-09-12 14:21 they never die... except for evms 2008-09-12 14:21 RIP 2008-09-12 14:21 well 2008-09-12 14:22 lvm3 will rise ;-) 2008-09-12 14:22 we're about a month away from serious lvm3 development 2008-09-12 14:22 http://www.novell.com/linux/security/apparmor/selinux_comparison.html 2008-09-12 14:22 kickoff 2008-09-12 14:22 tim_dimm, there's a proposal to have a public tux3 cabal meeting on Oct 31 2008-09-12 14:22 physically located at a certain garage I'm thinking of 2008-09-12 14:23 and on the web/net 2008-09-12 14:23 what think you? 2008-09-12 14:23 I'm there barring spit-up, diaper changes and burping sessions 2008-09-12 14:23 barring? 2008-09-12 14:23 should be irrespective of 2008-09-12 14:23 uh, poor choice of words 2008-09-12 14:23 anything except burping 2008-09-12 14:23 farting? 2008-09-12 14:23 not good enough 2008-09-12 14:23 too early to teeth 2008-09-12 14:24 how do you spell that- teeeethhh 2008-09-12 14:24 you know 2008-09-12 14:24 dana had one the first week 2008-09-12 14:24 was hell for anna 2008-09-12 14:24 i bet 2008-09-12 14:24 she quickly learned how to punish mommy with it 2008-09-12 14:25 so began a somewaht tense relationship ;) 2008-09-12 14:25 still? 2008-09-12 14:25 ;-) 2008-09-12 14:25 of course 2008-09-12 14:25 but detent has set in, mutual respect, mommy love, all that 2008-09-12 14:25 this apparently lasts till about 9 YO 2008-09-12 14:26 with luck 2008-09-12 14:26 guess they grow out of it 2008-09-12 14:26 tween is the new teen 2008-09-12 14:26 anyway 2008-09-12 14:26 we better stop talking like that 2008-09-12 14:26 or all the devs willrun away screaming 2008-09-12 14:27 tux3 and child-rearing 2008-09-12 14:27 and skating 2008-09-12 14:27 you will have lots of time to learn C while you're burping 2008-09-12 14:27 and rocking 2008-09-12 14:27 right, on that note I think I'll go skate now 2008-09-12 14:27 oh, big news 2008-09-12 14:27 k 2008-09-12 14:28 all ears 2008-09-12 14:28 (eyes) 2008-09-12 14:28 skateboarders clapped for my move yesterday 2008-09-12 14:28 nice- what was it? 2008-09-12 14:28 pronounced: "ok rollerbladers are allowed now" 2008-09-12 14:28 nothing much 2008-09-12 14:28 grind? 2008-09-12 14:28 skated up on the little vert wall on one skate, tapped the top with the other, skated down on one skate 2008-09-12 14:29 nice 2008-09-12 14:29 been grinding and getting nodes 2008-09-12 14:29 also skating down the top of the grinding wall 2008-09-12 14:29 very skinny 2008-09-12 14:29 tough to stay on 2008-09-12 14:29 it has an S curve at the end 2008-09-12 14:29 careful, grinds lead to crashes which leads to wrist injuries 2008-09-12 14:29 not much, but enough to drop you off 2008-09-12 14:29 my grinds aren't really grinds 2008-09-12 14:30 just slding down the rail onthe side of my skate 2008-09-12 14:30 one foot 2008-09-12 14:30 no danger 2008-09-12 14:30 I need to get protection before doing anything more 2008-09-12 14:30 makes lots of noise 2008-09-12 14:30 attracts attention ;) 2008-09-12 14:31 found the head of the U logs 2008-09-12 14:31 reading now 2008-09-12 14:31 have fun 2008-09-12 14:31 loads 2008-09-12 15:08 i just noticed: someone (?) said that dentries were 132 bytes. on my system it says 200. Normal deviations? 2008-09-12 15:09 or just a different kernel version? 2008-09-12 15:10 I'm reading the logs right now, see 8 references to dentries. none mention how many bytes 2008-09-12 15:11 second.05:29 < RazvanM> dentry 253015 253576 132 29 1 : tunables 120 60 8 : slabdata 8744 8744 0 2008-09-12 15:12 my bad- I searched for dentries 2008-09-12 15:12 not dentry 2008-09-12 15:13 well, not really important. just something i was wondering about 2008-09-12 15:14 data, 64 bit kernel? 2008-09-12 15:15 grossly big aren't they 2008-09-12 15:15 filename "foo" turns into a 200 byte dentry, and that's far from all the cache gobbling for that little guy 2008-09-12 15:15 yes, it is. right on both accounts. 2008-09-12 15:16 that's what makes sysfs such an idiotic idea 2008-09-12 15:16 take tiny little ascii strings which are already bloated way beyond the binary rep, and blow them up into gigantic, slow, awkward things 2008-09-12 15:17 then implement it badly on top of that 2008-09-12 15:17 and have a crappy internal and external interface 2008-09-12 15:17 bugs 2008-09-12 15:17 unstable api 2008-09-12 15:17 and you have the piece of shit we see today 2008-09-12 15:17 just thought I'd share that ;-) 2008-09-12 15:50 sk8 oclock 2008-09-12 15:54 ACTION is back 2008-09-12 15:54 I know nothing about apparmor 2008-09-12 15:57 kernel klink I presume ;-) 2008-09-12 15:57 (hogan's hero's) 2008-09-12 15:58 diaper-30 2008-09-12 15:58 l8tr 2008-09-12 18:17 nuther cuppa 2008-09-12 18:17 should be good enough to get refcounting implemented 2008-09-12 18:32 ok I see october 31st is a friday 2008-09-12 18:32 that means that the tux3 cabal meeting has to be a party 2008-09-12 18:32 might have to scale this up 2008-09-12 19:04 #define REFCOUNT_TABLE_BLOCK (1ULL << 28) 2008-09-12 19:04 #define REFCOUNT_HIGH_BLOCK (REFCOUNT_TABLE_BLOCK + (1ULL << 21)) 2008-09-12 19:04 #define UNATOM_TABLE_BLOCK (REFCOUNT_TABLE_BLOCK + (1ULL << 23)) 2008-09-12 19:10 -!- Aks(~ankitsriv@123.237.71.198) has joined #tux3 2008-09-12 19:16 Hi all, I am new to this project and want to know abt versioned pointers. 2008-09-12 19:16 welcome 2008-09-12 19:16 read the post yet? 2008-09-12 19:17 http://lwn.net/Articles/288896/ 2008-09-12 19:17 thanks for the link 2008-09-12 19:18 enjoy 2008-09-12 19:22 atom = dir->sb->atomgen++; /* use refcount for allocation */ 2008-09-12 19:22 if (!ext2_create_entry(dir, name, len, atom, 0)) { 2008-09-12 19:22 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (dir->sb->blockbits - 1)); 2008-09-12 19:22 struct buffer *buffer = bread(dir->map, block); 2008-09-12 19:22 *(u16 *)buffer->data += 1; 2008-09-12 19:22 brelse(buffer); 2008-09-12 19:22 return atom; 2008-09-12 19:22 } 2008-09-12 19:23 got to put in the carry bit handling 2008-09-12 19:29 mistake in that code 2008-09-12 19:30 ACTION challenges tux3 readers to find it 2008-09-12 19:30 it's in the block calc 2008-09-12 19:32 ACTION steps out for a bit 2008-09-12 19:32 oh, and I have to do endian conversion 2008-09-12 19:32 almost forgot 2008-09-12 20:20 atom >> (dir->sb->blockbits - 1) - that looks weird, although I'm not actually sure what exactly blockbits is, either way the -1 smells wrong 2008-09-12 20:21 ah, never mind 2008-09-12 20:21 *(u16 *)buffer->data += 1 <- this is wrong since this is first u16 in block 2008-09-12 20:23 lacks [atom & ((1 << (dir->sb->blockbits)) - 1] instead of the prefix '*' 2008-09-12 20:23 ie. it should be ((u16 *)buffer->data)[atom & ((1 << (dir->sb->blockbits)) - 1]++; 2008-09-12 20:24 ((u16 *)buffer->data)[atom & ((1 << dir->sb->blockbits) - 1)]++; 2008-09-12 20:24 parentheses got mixed up 2008-09-12 20:25 from which I guess blockbits on a 4K filesystem is lg2(4Ki) = 12 2008-09-12 20:26 at which point the first comment about it smelling funny is irrelevant (I thought blockbits was the number of bits in a block, ie. 4Ki * 8 for a 4KiB block) 2008-09-12 20:26 so you're using bread/brelse, not bios? 2008-09-12 20:26 does brelse bwrite? 2008-09-12 20:27 I guess it must work like some sort of in kernel mmap 2008-09-12 20:27 hence the no nead for explicit write back 2008-09-12 20:27 at which point I guess we rely on cpu page dirty bits to actually know whether we need to write back 2008-09-12 20:31 unless bread/brelse, are actually operations on blocks within a file, which is suggested by the dir->map first parameter 2008-09-12 20:31 hmm, isn't it clear I have no bloody idea what I'm talking about yet? 2008-09-12 20:31 and I'm talking into a blackhole... 2008-09-12 20:31 start talking to yourself... then you know you're going crazy. 2008-09-12 20:41 -!- Kirantpatil(~kiran@122.167.202.116) has joined #tux3 2008-09-12 21:08 maze, it's all messed up, actually 2008-09-12 21:08 rev on the way 2008-09-12 21:08 you are right about the lacks 2008-09-12 21:08 maze, I should just have asked you to write it ;-) 2008-09-12 21:08 ACTION will be back 2008-09-12 21:09 which lacks 2008-09-12 21:09 don't leave ;-) I'm here 2008-09-12 21:09 what am I right about? I was guessing above? 2008-09-12 21:10 ofcourse with u16* in their it's all only host-endian compatible 2008-09-12 21:10 s/their/there 2008-09-12 21:11 I think stick to 1 byte counters - deal away with endianness in all but 0.5% of the cases 2008-09-12 21:12 eh, I'm not sure this entire effort is worth it 2008-09-12 21:12 I think you just need to support a half-dozen hard coded atoms, a small list of atoms for selinux, and store the rest as strings 2008-09-12 21:12 ultimately just gzipping the xattr block may be the easiest 2008-09-12 21:13 for everything besides selinux/acl 2008-09-12 21:13 [still parsing through the binary acl encoding, to see if it can be faked] 2008-09-12 21:20 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-12 21:33 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-12 21:35 although linux currently seems to use something more like 32 [4 byte header] + (16 [tag/type] + 16[...rwx] + 32[default=-1]) * 4 + (16 [tag/type] + 16[...rwx] + 32[uid/gid]) * [# of exceptions] bits 2008-09-12 21:35 either way, while these are normally small - or not even present - they can grow arbitrally large 2008-09-12 21:37 hmm 2008-09-12 21:37 interesting questions 2008-09-12 21:37 do filesystems in linux implement selinux and acls, or do they just implement xattr - would think just xattr, but... there are ext2/3/4... etc acl.h 2008-09-12 21:38 oh, directories get doubled entries, one being default 2008-09-12 21:38 the other being actual 2008-09-12 21:39 the 4 byte header is the version (lendian 2) 2008-09-12 22:00 back 2008-09-12 22:00 maze, it's not much effort 2008-09-12 22:00 . 2008-09-12 22:00 and it exactly emulates the ascii xattr behaviour, with superior compression 2008-09-12 22:01 now I just need to write it right ;) 2008-09-12 22:01 yes, but there are some issues there 2008-09-12 22:01 there are? 2008-09-12 22:01 for example: selinux is a quad of four values 2008-09-12 22:01 I thought I was just about done 2008-09-12 22:01 if you have a lot of valid states for each of them 2008-09-12 22:01 then the total number of states for the entire quad can blossom 2008-09-12 22:01 those are attr bodies 2008-09-12 22:02 we're working on attr names at the moment 2008-09-12 22:02 ah, and see... this is the tricky part 2008-09-12 22:02 you kind of have to look at some of them at the same time 2008-09-12 22:02 some of... selinux acls? 2008-09-12 22:03 so basically, once you strip out the selinux and extended acl xattrs (that's 3 different xattr strings), all that's left is barely used by anyone 2008-09-12 22:03 ok I see what you're saying 2008-09-12 22:03 sorry, didn't read carefully 2008-09-12 22:03 you're running out way ahead of me 2008-09-12 22:03 as usual 2008-09-12 22:03 it may not be worth optimizing that... 2008-09-12 22:03 well it can be optimized at the selinux level 2008-09-12 22:04 exactly. 2008-09-12 22:04 maybe not quite as efficiently 2008-09-12 22:04 maybe more 2008-09-12 22:04 so selinux basically needs to be optimized 2008-09-12 22:04 let's let them tell us 2008-09-12 22:04 acls need to be optimized 2008-09-12 22:04 probably 2008-09-12 22:04 and then all the rest needs to be (maybe?) optimized if we feel like it 2008-09-12 22:04 so I suggest that once xattrs are working properly, we invite the selinux folks to come over and do an audit 2008-09-12 22:04 and what needs to be optimized is not the headers (ie. the security.something= part) 2008-09-12 22:04 but the bodies 2008-09-12 22:05 ie. the part after the = 2008-09-12 22:05 both 2008-09-12 22:05 imho 2008-09-12 22:05 the part before the = sign is trivial, there are about 9 values to compress as atoms, leave the rest as strings 2008-09-12 22:05 the heads are much less variable, therefore so much easier to optimize 2008-09-12 22:05 agreed. 2008-09-12 22:05 the problem, though, isn't so much what to optimize 2008-09-12 22:06 so... bodies 2008-09-12 22:06 not that hard 2008-09-12 22:06 but how to store this, and where... 2008-09-12 22:06 but maybe not appropriate at this level 2008-09-12 22:06 we'll see 2008-09-12 22:06 and which parts to store on disk where 2008-09-12 22:06 it's kind of important to plan this out correctly to begin with, because this is ondisk format, not in memory 2008-09-12 22:06 lyou know, we'd probably get more mileage out of giving the selinux guys a way to run their own dictionary 2008-09-12 22:06 just like our atom dictionary 2008-09-12 22:06 exactly... 2008-09-12 22:06 and have an api for it 2008-09-12 22:06 kay 2008-09-12 22:06 we'll propose it 2008-09-12 22:06 I'm convinced now we need 4 dicts for selinux (one per quad) 2008-09-12 22:07 but first let them have a look at the basics 2008-09-12 22:07 and a dictionary for acls 2008-09-12 22:07 and a dictionary for 'other' xattrs headers 2008-09-12 22:07 it's my understanding they're usually disappointed with performance etc of just basic xattrs 2008-09-12 22:07 so basically what we need is some extensible dict interface 2008-09-12 22:07 trying not to let tux3 fall into that 2008-09-12 22:07 which falls into the log nicely 2008-09-12 22:07 right 2008-09-12 22:07 and it's app specific 2008-09-12 22:07 we need to export an api, not for dicts 2008-09-12 22:07 not four dicts 2008-09-12 22:08 the api would be the normal add/remove/list xattr api that everybody uses 2008-09-12 22:08 what's important is how we store it internally in the fs 2008-09-12 22:08 plus a wahy of dividing it into four 2008-09-12 22:08 that's easy - split on : 2008-09-12 22:08 bleah 2008-09-12 22:08 no parsing 2008-09-12 22:08 in the fs 2008-09-12 22:09 mechanism, not policy 2008-09-12 22:09 yeah, well, that's gonna have to happen, unless you don't want to split the quads 2008-09-12 22:09 which could potentiall explode the dics 2008-09-12 22:09 no parsing ;-) 2008-09-12 22:09 notice, tux3 has no parsing 2008-09-12 22:09 if we want we can use zlib 2008-09-12 22:09 or something more global 2008-09-12 22:09 you're missing the point here ;-) 2008-09-12 22:09 zlib is nice and all that 2008-09-12 22:10 but selinux xattrs are used on every frickin file access 2008-09-12 22:10 still 2008-09-12 22:10 they have to be blazing fast 2008-09-12 22:10 no parsing ;-) 2008-09-12 22:10 we need a better solution 2008-09-12 22:10 you can make the parsing in such a way that it'll still work even if it doesn't parse 2008-09-12 22:10 preferably one that performs even better than stupid ascii colon separated strings 2008-09-12 22:10 ah, but the ascii colon seperated strings are the api 2008-09-12 22:11 you have to do it that way 2008-09-12 22:11 sucky api 2008-09-12 22:11 unless we rip through all of the selinux code in the kernel 2008-09-12 22:11 anyway, linux does not have an acl api 2008-09-12 22:11 selinux does 2008-09-12 22:11 different 2008-09-12 22:11 the vfs layer provides an xattr api 2008-09-12 22:11 that's what we have to implement 2008-09-12 22:11 right, only that 2008-09-12 22:11 well 2008-09-12 22:11 we don't have to parse 2008-09-12 22:11 however internally we have to make it deal with the common cases quickly 2008-09-12 22:12 we can compress on byte pair if we like 2008-09-12 22:12 byte pair of what? you're getting strings in the api? 2008-09-12 22:12 byte pairs is a typical compression method 2008-09-12 22:12 16 bit values work better with a dict than 8 or 48 2008-09-12 22:12 for example 2008-09-12 22:12 and the common cases are going to be reading (and to a lesser extent writing) selinux xattrs and less often (but still very often) extended acls 2008-09-12 22:13 verging on premature optimization here 2008-09-12 22:13 no no, don't think lzw compression - that doesn't buy us anything here 2008-09-12 22:13 the selinux guys will cream all over if xattrs just work fine 2008-09-12 22:13 what needs to be done is we need to explicitly remove the selinux/acl from the xattr code and not treat them in the fs as xattrs at all 2008-09-12 22:13 treat them like you treat inode permissions 2008-09-12 22:14 put them directly in the inode 2008-09-12 22:14 have you measured the actual disbribution of unique quads? 2008-09-12 22:14 I thought you did that 2008-09-12 22:14 yes 2008-09-12 22:14 and it came out very tight 2008-09-12 22:14 exactly 2008-09-12 22:14 tightly clustered too 2008-09-12 22:14 so what's the problem 2008-09-12 22:14 but that's not something we can guarantee on a prod system 2008-09-12 22:14 just atomize the common ones 2008-09-12 22:14 store the weirdos literally 2008-09-12 22:14 agreed. 2008-09-12 22:14 ok 2008-09-12 22:14 so let's do it 2008-09-12 22:15 hmm, how to put this 2008-09-12 22:15 you don't want the xattr_get(selinux_xattr) 2008-09-12 22:15 to have to parse the entire xattr block for the inode 2008-09-12 22:16 it doesn't 2008-09-12 22:16 it only looks in the xcache 2008-09-12 22:16 but xattrs of other types, can be pretty much unique per file... 2008-09-12 22:16 (above md5/sha1 hash case) 2008-09-12 22:17 right 2008-09-12 22:17 so I guess we're going to check a hash of the xattr 2008-09-12 22:17 htree style 2008-09-12 22:17 so having the two very differently performing/characteristic concepts in one place will most likely break performance 2008-09-12 22:18 it's not very hard to look for likely atomize candidates I think 2008-09-12 22:18 depends on what the symbols of the alphabet 2008-09-12 22:18 and how deeply you're parsing 2008-09-12 22:18 I say, just put everything in the dict 2008-09-12 22:18 why not? 2008-09-12 22:18 has to be stored somewhere 2008-09-12 22:19 security.selinux="unconfined_u:object_r:default_t:s0\000" 2008-09-12 22:19 system.posix_acl_access="0sAgAAAAEABwD/////AgAEAGQAAAAEAAUA/////xAABQD/////IAAFAP////8=" 2008-09-12 22:19 user.hash="sdfsdfjhsdjfhsdjkfhdjskahfjkdsahkj" 2008-09-12 22:19 the dict is as good a place as any 2008-09-12 22:19 how would you atomize the above? 2008-09-12 22:19 ext2_find_entry 2008-09-12 22:19 later, htree_find_entry 2008-09-12 22:20 sucky compression in the example 2008-09-12 22:21 so how will the dict deal, with a few dozen entries with milions of occurences, a few hundred with tens of thousands, and a few million entries with one (to a couple) occurence(s) each 2008-09-12 22:21 that's a real world scenario straight of my laptop 20G drive 2008-09-12 22:21 millions of occurences, what's the problem? 2008-09-12 22:21 few million, easy 2008-09-12 22:21 that's what htree does 2008-09-12 22:21 handles millions of entries 2008-09-12 22:21 really fast 2008-09-12 22:22 uhm, I think my problem is I'm not convinced it's fast enough, when it could be O(1) 2008-09-12 22:22 o(1) is always good 2008-09-12 22:22 the millions of unique entries are blossoming the tree 2008-09-12 22:22 but damm fast is damm fast 2008-09-12 22:23 it's a btree 2008-09-12 22:23 slowing down accesses for the millions of entries 2008-09-12 22:23 it likes to blossom 2008-09-12 22:23 it says "go ahead, make my day" 2008-09-12 22:23 right, but are common entries stored nearer the root? 2008-09-12 22:23 never 2008-09-12 22:23 no - because it's a btree 2008-09-12 22:23 right 2008-09-12 22:23 so you've got o(depth) lookups 2008-09-12 22:23 very flat 2008-09-12 22:23 usually only two levels 2008-09-12 22:23 precisely what you want to avoid 2008-09-12 22:23 for a few million entires 2008-09-12 22:23 depth is smaller than you think 2008-09-12 22:24 much smaller 2008-09-12 22:24 but you're thinking of access speed from a disk io performance 2008-09-12 22:24 outlook 2008-09-12 22:24 nope 2008-09-12 22:24 we need to be fast in ram 2008-09-12 22:24 cpu speed 2008-09-12 22:24 that's what it works at 2008-09-12 22:24 dirops are cpu bound 2008-09-12 22:24 not disk bound 2008-09-12 22:25 ok, I'm firmly of the opinion we need 2 different dicts/htrees at the minimum 2008-09-12 22:25 I agree: 1) heads 2) bodies 2008-09-12 22:25 but I know you want to parse and segment 2008-09-12 22:25 one small one for the stuff which is known to exist all over the place (selinux/acl bodies) 2008-09-12 22:25 I don't think we should, the selinux guys should 2008-09-12 22:25 but 2008-09-12 22:25 the other for non-standard bodies 2008-09-12 22:25 I;'ll keep an open mind 2008-09-12 22:25 we would need to change the kernel vfs interface of selinux - I don't see that happening 2008-09-12 22:26 and then we'd need to keep around the old one for other fs'es anyway 2008-09-12 22:26 well xattrs already have namespaces 2008-09-12 22:26 part of the abpi 2008-09-12 22:26 api 2008-09-12 22:26 braindamaged part 2008-09-12 22:26 and even if we don't split, we'll still get perf boosts 2008-09-12 22:26 (don't split on :) 2008-09-12 22:26 that's a colon ) 2008-09-12 22:26 I thought it was a smile :) 2008-09-12 22:26 well it was a ':' than a ')' 2008-09-12 22:26 ;-) 2008-09-12 22:27 ok, immediate goal is to fix the brain damage in my refcounting 2008-09-12 22:27 if you have a dentry and inode already in memory 2008-09-12 22:27 sorry about the pile of poo I posed ;) 2008-09-12 22:27 how long does it take to fetch the xattrs for that inode? 2008-09-12 22:27 order of magnitude 2008-09-12 22:27 oh that's another thing... we can easily put a small hash in front of the atom dict 2008-09-12 22:28 very easily 2008-09-12 22:28 sub microsecond 2008-09-12 22:28 I guess 2008-09-12 22:28 [because with the correct implementation the above is less than 50 cycles] 2008-09-12 22:28 sure, and a microsecond is about 3,000 2008-09-12 22:28 60 times slower 2008-09-12 22:29 so there's something to be gained 2008-09-12 22:29 I thinjk we gain most of it from putting a hash in front of the dirops 2008-09-12 22:29 hash of what? 2008-09-12 22:29 so we end up with level 1, level 2 2008-09-12 22:29 hash of the thing we're atomizing 2008-09-12 22:29 just keep the common ones there 2008-09-12 22:29 let the cold ones drop off 2008-09-12 22:30 ok, now you've gotten ahead of me... 2008-09-12 22:30 it's a linux meme 2008-09-12 22:30 dentry hash as an example of that 2008-09-12 22:30 ok, how about, first a question: what exactly is an atom (example?) 2008-09-12 22:30 sucky example 2008-09-12 22:30 [how big is an atom] 2008-09-12 22:30 it's just a small integer with a name 2008-09-12 22:30 and a refcount 2008-09-12 22:30 ok, the names, can we have an example? 2008-09-12 22:30 the name of an atom is up to 255 chars (tradition) 2008-09-12 22:31 names are unrestricted 2008-09-12 22:31 pascal strings ;-) 2008-09-12 22:31 right 2008-09-12 22:31 in fact they are pascal strings 2008-09-12 22:31 that's what ext2 uses 2008-09-12 22:31 and what I always use 2008-09-12 22:31 ok, so how would you atomize (where would the atom boundaries be) in the above 3 line xattr example I posted? 2008-09-12 22:32 beats the crap out of shitty C strings 2008-09-12 22:32 strlen is fast ;-) 2008-09-12 22:32 sucks compared to looking up a byte 2008-09-12 22:32 strlen does cacheline damage 2008-09-12 22:32 "considered harmful to cache lines" 2008-09-12 22:32 I meant strlen is fast on pascal strings 2008-09-12 22:32 right 2008-09-12 22:33 I'd atomize the whole 3 line xattr 2008-09-12 22:33 and store the atom 2008-09-12 22:33 the whole thing? 2008-09-12 22:33 and have a limit of 2^48 atoms 2008-09-12 22:33 ok, on my drive, you'd have all refcounts = 1 2008-09-12 22:33 right now, it's 2^32 atoms 2008-09-12 22:33 if we do bodies, might want to widen that 2008-09-12 22:33 sure 2008-09-12 22:33 who cares 2008-09-12 22:34 selinux does 2008-09-12 22:34 refcounts take hardly any space 2008-09-12 22:34 2 bytes each 2008-09-12 22:34 now, as soon as something _does_ collide, you know right away 2008-09-12 22:34 that is 2008-09-12 22:34 match another body 2008-09-12 22:34 well 2008-09-12 22:34 anyway 2008-09-12 22:34 it's premature 2008-09-12 22:35 xattrs have to work 2008-09-12 22:35 or nobody cares how well we compress acls 2008-09-12 22:35 yes, but they have to be treated seperately 2008-09-12 22:35 here - I'll write up how it should be done IMHO 2008-09-12 22:35 I know, you want to tokenize 2008-09-12 22:35 the xattr string 2008-09-12 22:35 and compress that way 2008-09-12 22:36 so why don't they tokenize? 2008-09-12 22:36 yes please 2008-09-12 22:36 and let's invite the selinuxen to read that post 2008-09-12 22:37 builtin:[security.selinux] sedict1:[unconfined_u] sedict2:[object_r] sedict3:[default_t] sedict4:[s0] 2008-09-12 22:37 builtin:[system.posix_acl_access] acl_dict:[0sAgAAAAEABwD/////AgAEAGQAAAAEAAUA/////xAABQD/////IAAFAP////8] 2008-09-12 22:37 builtin:[user.] user_dict:[hash] user_dict:[sdfsdfjhsdjfhsdjkfhdjskahfjkdsahkj] 2008-09-12 22:37 I'm still researching the subject 2008-09-12 22:37 and then you want to store the first two directly within the inode 2008-09-12 22:37 got to fix my mess here now 2008-09-12 22:37 that's for one file? 2008-09-12 22:38 sedict1=2=3=4 could be the same dict, potentially could be the same dict as the acl_dict, potentially the same as builtin 2008-09-12 22:38 yes 2008-09-12 22:38 we stoe all of the directly in the inode 2008-09-12 22:38 on disk 2008-09-12 22:38 and cache them in memory 2008-09-12 22:38 when the inode is loaded 2008-09-12 22:38 so what's the size of an inode on disk? 2008-09-12 22:38 variable 2008-09-12 22:38 maximum? 2008-09-12 22:39 from about 40 bytes to unlimited 2008-09-12 22:39 current limitation is an inode table block 2008-09-12 22:39 but that will go away 2008-09-12 22:40 hmm, I need to start writing a junk fs 2008-09-12 22:40 to get a better feeling for the kernel interfaces 2008-09-12 22:40 you'll get an excellent chance very soon 2008-09-12 22:40 we're going to start by porting a junk fs to kernel 2008-09-12 22:40 next tuesday we'll do that 2008-09-12 22:41 really? 2008-09-12 22:41 hmm 2008-09-12 22:41 promise 2008-09-12 22:41 you said you wanted me to pick up the pace 2008-09-12 22:41 pick it up we shall 2008-09-12 22:41 hope we don't lose anybody ;) 2008-09-12 22:41 I'm beginning to think an fs should actually have (at least) two layers 2008-09-12 22:42 basically the frontend (UI / interface with vfs) and the backend (interface with block devices) 2008-09-12 22:42 with the ability for the middle to be network seperated 2008-09-12 22:42 what about the inodes in the middle? 2008-09-12 22:42 you need a clean api in the middle that deals correctly with coherency issues 2008-09-12 22:43 but I think this is the only way to get a well performing net fs 2008-09-12 22:43 you are entirely correct, and that is how tux3 is structured 2008-09-12 22:43 it has the cache level and the block level 2008-09-12 22:43 they are separately cleanly... better be 2008-09-12 22:43 or it simply won't work 2008-09-12 22:44 the backend is then a get/set/lock/unlock/notify system 2008-09-12 22:44 well 2008-09-12 22:44 kinda 2008-09-12 22:44 the back end is more like async messages 2008-09-12 22:44 loosely 2008-09-12 22:44 very loosely 2008-09-12 22:44 yeah, that kind of describes what I'm thinking 2008-09-12 22:45 hard to phrase really 2008-09-12 22:45 especially since it's still unclear to me ;-) 2008-09-12 22:45 the fact it has to be implemented as to separate pieces is now clear to me, with an interface layer that uses tcp-ip 2008-09-12 22:45 BUT 2008-09-12 22:45 can short-circuit the network stack on local host 2008-09-12 22:46 great, it was never clear to me ;) 2008-09-12 22:46 it just came out like that 2008-09-12 22:46 did it itself 2008-09-12 22:46 here's an example: 2008-09-12 22:47 application -> user space -> kernel space -> vfs layer -> client file system layer -> send rpc call -> network stack -> receive rpc call -> dispatch -> server file system layer -> block device layer 2008-09-12 22:47 and that's only half the loop 2008-09-12 22:47 for an nfs 2008-09-12 22:47 now if the tcp/ip network stack layer does cookies or UUID to identify that it's talking to itself, than it can zip it up to 2008-09-12 22:48 client fs layer -> direct dispatch -> server fs layer 2008-09-12 22:48 and of course there's the return path 2008-09-12 22:48 that's the kind of thinking that originally lead to nfs 2008-09-12 22:48 actually, the reverse of that 2008-09-12 22:48 and it has to be part sync, part async, part notify 2008-09-12 22:48 we had your second one 2008-09-12 22:48 and some genius decided it could easily be hacked to be the first one 2008-09-12 22:48 anyway 2008-09-12 22:49 it's not a NFS 2008-09-12 22:49 the real problem is now how to minimize the latency and data sent across the net in the middle 2008-09-12 22:49 it will become a cluster fs before it becomes an nfs 2008-09-12 22:49 and that relies on doing cache coherency and read/write (various types of) and notifications of changes/lock/lock-breaking correctly 2008-09-12 22:49 yes 2008-09-12 22:50 anyway... enough about my plans to conquer the world 2008-09-12 22:50 which _nobody_ in the oss world has succeed in doing well 2008-09-12 22:50 probably also not in the propietary world either 2008-09-12 22:50 anyway, you need to plan the entire fs from the ground up with the assumption all the clients (even the local host) are remote 2008-09-12 22:50 since we can't see the code or try it we don't know 2008-09-12 22:50 that way you don't need to deal with the local host specially 2008-09-12 22:50 well 2008-09-12 22:51 you don't have to put in remote hooks from the beginning 2008-09-12 22:51 [except for the dispatch optimization] 2008-09-12 22:51 you just have to be aware of where problems can be created 2008-09-12 22:51 you have to design it as if they were there 2008-09-12 22:51 you might not code it quite like that 2008-09-12 22:51 although I think you should 2008-09-12 22:51 whre it costs nothing, yes 2008-09-12 22:51 even if the net code is a shim .h file 2008-09-12 22:51 that's seldom the case 2008-09-12 22:52 but the real problem is, you're answering a demand that doesn't exist 2008-09-12 22:52 people have been optimizing for the wrong situation ;-) 2008-09-12 22:52 you're hoping your nfs will be so much more amazing, everybody will use it instead of sucky nfs 2008-09-12 22:52 what demand do you think that is? 2008-09-12 22:52 but you're likely to be amazed and disappointed 2008-09-12 22:52 truthfully? 2008-09-12 22:52 there's very little demand for a good nfs outside of hpc 2008-09-12 22:52 and they like lustre 2008-09-12 22:52 I don't care about who uses it or not ;-) 2008-09-12 22:53 prefectly happy 2008-09-12 22:53 they just want it to be more reliable and faster 2008-09-12 22:53 I just like good design 2008-09-12 22:53 well 2008-09-12 22:53 just writing a dlm to support it will keep you busy for months 2008-09-12 22:53 if you know _exactly_ what to do 2008-09-12 22:53 yeah, I know, not a good way to design it ;-) 2008-09-12 22:53 if you want to make money 2008-09-12 22:53 but oh well 2008-09-12 22:54 you can do it 2008-09-12 22:54 if you already have something working 2008-09-12 22:54 that people want 2008-09-12 22:54 and are willing to bribe you to make even more like what they want 2008-09-12 22:54 eh, dlm's aren't that hard if you have a clean api and don't have to deal with prior borkage 2008-09-12 22:54 well 2008-09-12 22:54 "not hard" translates into several months, trust me 2008-09-12 22:54 problem is the leakage of breakage from outside 2008-09-12 22:54 but prove me wrong 2008-09-12 22:55 I would like to have a good dlm 2008-09-12 22:55 in fact 2008-09-12 22:55 would you be kind enough to post a design note on dlm? 2008-09-12 22:55 because I'd like to cluster tux3 2008-09-12 22:55 nope, because I don't have the design yet ;-) 2008-09-12 22:55 by this time next year 2008-09-12 22:55 well 2008-09-12 22:55 when? 2008-09-12 22:55 I think I'm going to try writing a junk fs this weekend 2008-09-12 22:55 unless stuff burns (I'm oncall) 2008-09-12 22:55 great 2008-09-12 22:56 and we'll see how well I understand the kernel apis 2008-09-12 22:56 I'll check in with you on saturday when you're 50% done 2008-09-12 22:56 you'll figure them out fast 2008-09-12 22:56 and I'm going to start writing directly in kernel space, because that's the entire purpose of the exercise ;-) 2008-09-12 22:56 little painful to get some of the crap to behave 2008-09-12 22:56 I believe debugging is probably easiest in kvm? 2008-09-12 22:56 uml 2008-09-12 22:56 far and away 2008-09-12 22:56 why? 2008-09-12 22:57 can you strace ? 2008-09-12 22:57 just: make defconfig ARCH=um && make linux ARCH=um; ./linux ubd0=/my/rootfs 2008-09-12 22:57 that's it 2008-09-12 22:57 you can gdb it 2008-09-12 22:57 ah 2008-09-12 22:57 takes a little coaxing 2008-09-12 22:58 where are the logs stored for this channel btw? 2008-09-12 22:58 checking tux3.org 2008-09-12 22:58 linked from shapor's page I think, which is linked from tux3.org 2008-09-12 22:58 http://shapor.com/tux3/irclogs/current.txt 2008-09-12 22:58 hehe, uptodate to the second 2008-09-12 22:59 ok, you will be having fun writing lots of new fs code and I will be slaving away finishing xattrs 2008-09-12 22:59 you got the better deal 2008-09-12 23:00 $ wget -q -O - "http://shapor.com/tux3/irclogs/current.txt" | cut -b18- | sed -rn 's@^<([^>]*)>.*@\1@p' | sort | uniq -c | sort -nr | head -n 9 2008-09-12 23:00 6427 flips 2008-09-12 23:00 1199 shapor 2008-09-12 23:00 908 MaZe 2008-09-12 23:00 818 bh 2008-09-12 23:00 616 konrad 2008-09-12 23:00 369 tim_dimm 2008-09-12 23:00 113 vandenoever 2008-09-12 23:00 104 RazvanM 2008-09-12 23:00 96 flipz 2008-09-12 23:00 interesting stats there 2008-09-12 23:01 how the hell I'm number 3 on that list I'll never know... 2008-09-12 23:01 you're moving up fast 2008-09-12 23:01 fast typer 2008-09-12 23:01 just have to press the enter key enough :) 2008-09-12 23:01 well, yeah 2008-09-12 23:01 and wiggle those fingers 2008-09-12 23:01 now everybody is just typing stuff to boost their ranking ;-) 2008-09-12 23:02 nice example of sed chickentracks 2008-09-12 23:02 I am a sed-maniac 2008-09-12 23:02 you can cut and paste your code examples if you need a quick boost 2008-09-12 23:03 right, well, I also have a copy of spore waiting for me... 2008-09-12 23:03 wonder if it's any good 2008-09-12 23:04 let me know 2008-09-12 23:04 and I should reinstall my desktop with the next version of ubuntu 2008-09-12 23:04 my 4 year old can't wait to get her hands on pure 2008-09-12 23:04 the quad racing game 2008-09-12 23:04 from disney 2008-09-12 23:04 demon is much fun 2008-09-12 23:04 demo 2008-09-12 23:05 I'll probably take a few spins around the italian track after I do the next refcount iter 2008-09-12 23:05 folks 2008-09-12 23:05 unsigned attomoff = (atom << 1) & (-1 << blockbits); 2008-09-12 23:05 hey bh 2008-09-12 23:05 re above: new fs code includes xattrs eventually ;-) 2008-09-12 23:05 wow, when I need a reason not to code, one quickly arrives 2008-09-12 23:05 good luck with that ;) 2008-09-12 23:06 for me, a week on xattrs alone 2008-09-12 23:06 maybe you're faster 2008-09-12 23:06 uhm that code you posted looks wrong 2008-09-12 23:06 missing ~ 2008-09-12 23:06 yeah 2008-09-12 23:06 unsigned attomoff = (atom << 1) & ~(-1 << blockbits); 2008-09-12 23:06 that's why I pasted it ;) 2008-09-12 23:06 better than a compiler 2008-09-12 23:06 oh, sorry 2008-09-12 23:06 didn't realize it was a quiz 2008-09-12 23:06 heh 2008-09-12 23:07 no it was me actually fucking up 2008-09-12 23:07 in a way 2008-09-12 23:07 oh paste in wrong window? 2008-09-12 23:07 and in a way, not 2008-09-12 23:07 no 2008-09-12 23:07 I pasted it, you saw the bug 2008-09-12 23:07 nice 2008-09-12 23:07 heh 2008-09-12 23:08 unsigned block = ATOM_REFCOUNT_BLOCK + ((atom >> blockits) << 1)); 2008-09-12 23:09 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (blockits - 1)); 2008-09-12 23:09 the above is wrong 2008-09-12 23:09 again ;-) 2008-09-12 23:09 since it's always even 2008-09-12 23:09 are you running an IQ test or something? 2008-09-12 23:10 a stupidity test on myself 2008-09-12 23:10 atom is not always even 2008-09-12 23:10 why would it be? 2008-09-12 23:10 block is 2008-09-12 23:11 ((atom >> blockits) << 1)) <- always even 2008-09-12 23:11 the code you pasted results in blocks even/oddness being constant 2008-09-12 23:11 yeah, gosh 2008-09-12 23:11 I'd assume you don't want that 2008-09-12 23:11 well actually I think I do 2008-09-12 23:11 the even block is for the low 16 bits 2008-09-12 23:11 the odd for the high 2008-09-12 23:12 then you're still 1 off 2008-09-12 23:12 probably 2008-09-12 23:12 since what you described is true with double 8 bits 2008-09-12 23:12 not with double 16 bits 2008-09-12 23:12 I was definitely conflating things and being fuzzy 2008-09-12 23:12 that's why you will get your xattrs written in _less_ than a week 2008-09-12 23:13 unsigned block = ATOM_REFCOUNT_BLOCK + (atom >> (blockits - 1)) << 1; 2008-09-12 23:13 is what you want if you want double-blocks of u16s with low and high blocks 2008-09-12 23:13 yes 2008-09-12 23:13 thanks 2008-09-12 23:13 although that deserves a comment ;-) 2008-09-12 23:13 see, now I'm getting semantically addressable code 2008-09-12 23:14 I loosely indicate the semantics, and the code comes back 2008-09-12 23:14 what do you mean? 2008-09-12 23:14 that's the one we want I think 2008-09-12 23:14 I think you'd actually want to spread it in a different way for performance reasons 2008-09-12 23:15 first all the low blocks, then all the high blocks 2008-09-12 23:15 since the high blocks will almost never be updated 2008-09-12 23:15 thus you want low blocks to be sequential on disk 2008-09-12 23:15 16 bit count with carry into the high block is a very nice balance between getting lots of atoms onto a block and not carrying too often 2008-09-12 23:15 there's no advantage to separating it that way 2008-09-12 23:15 that I know of 2008-09-12 23:15 a _very_ small advantage in the radix tree lookup 2008-09-12 23:16 but it was my first idea, what you said 2008-09-12 23:16 and I might stick with that indeed 2008-09-12 23:16 I think having two different ATOM_REFCOUNT_BLOCK would make for cleaner easier to understand code 2008-09-12 23:16 two different? 2008-09-12 23:16 ATOM_REFCOUNT_{LOW,HIGH}_BLOCK 2008-09-12 23:17 sure 2008-09-12 23:17 it's that way now 2008-09-12 23:17 before you posted that expression 2008-09-12 23:17 I haven't actually looked at the code ;-) 2008-09-12 23:17 just buggy 2008-09-12 23:17 ok 2008-09-12 23:17 give me an hour 2008-09-12 23:17 and you get to look at working, tested code 2008-09-12 23:17 I program much better when I'm not chatting 2008-09-12 23:18 ok, I think I'm gonna finally head home from work, and maybe reboot into mac os x and start up spore - and stop bothering you ;-) 2008-09-12 23:18 wow 2008-09-12 23:18 didn't realize you're camping out there 2008-09-12 23:18 but why not 2008-09-12 23:18 infinite junk food 2008-09-12 23:18 sweet sound of vacuum cleaners in the distance 2008-09-12 23:19 plus 30" screen 2008-09-12 23:19 not to mention my place is a studio 2008-09-12 23:19 that can be fixed 2008-09-12 23:19 just put in a ticket 2008-09-12 23:19 has an empty fridge (ok, I have spaghetti) 2008-09-12 23:19 you'll have one at home 2008-09-12 23:19 yah 2008-09-12 23:19 is a total mess 2008-09-12 23:19 you need to get that ramyun 2008-09-12 23:20 get it on the way home 2008-09-12 23:20 and is lacking (even after more than 2 years) basic amenities like a desk 2008-09-12 23:20 ouch 2008-09-12 23:20 just order one online 2008-09-12 23:20 ikea 2008-09-12 23:20 have it in 3 days 2008-09-12 23:20 because I've never found a good way to fit one in 2008-09-12 23:20 some assembly required 2008-09-12 23:20 that's the trick 2008-09-12 23:20 so to be fair, even back in Poland, when I had a desk, I spent most time with laptop on lap on bed 2008-09-12 23:20 and move to phoenix where you can have a real house ;) 2008-09-12 23:20 which is one of the reasons I've never bothered 2008-09-12 23:21 how do you say "hello" in polish? 2008-09-12 23:21 hallo? 2008-09-12 23:21 DzieÅ„ Dobry. 2008-09-12 23:21 is Good Day 2008-09-12 23:21 hello is 'halo' 2008-09-12 23:21 characters didn't work in xchat 2008-09-12 23:21 but that's kind of like pulled in from english 2008-09-12 23:22 that's the unaccented version of above? 2008-09-12 23:22 Dzien' Dobry 2008-09-12 23:22 n with accent like / 2008-09-12 23:22 got it 2008-09-12 23:22 like sheen dobry 2008-09-12 23:22 kinda 2008-09-12 23:22 with more bite 2008-09-12 23:22 and 'halo' is more like a phone greeting when you pick up then really hello 2008-09-12 23:23 ok, I'll try it on a pole tomorrow and see if it works 2008-09-12 23:23 you're more likely to use 'hej' (hey) between friends in person, and dzien dobry for more formal purposes and 'halo' when picking up the phone 2008-09-12 23:23 hmm 2008-09-12 23:23 let me write it down more phonetically 2008-09-12 23:23 ugh 2008-09-12 23:24 dobry = [hard d] [short o] [hard b] [hard trilled r] [short i/y] 2008-09-12 23:25 oh yeah I can say dobry 2008-09-12 23:25 even with the trill 2008-09-12 23:25 so dzien, is dzie - en - two syllables 2008-09-12 23:26 tshee - ehn 2008-09-12 23:26 ? 2008-09-12 23:26 were dzi is a consonant, e is a vowel and n' is a soft nasal n 2008-09-12 23:26 nice 2008-09-12 23:26 okay maybe more like one syllable, kind of hard to say because soft vowels (and there's two of them here) are kind of syllable like 2008-09-12 23:27 so the i in dzi takes the sound dz and makes it soft 2008-09-12 23:27 good enough to try 2008-09-12 23:27 I'll get corrected soon enough 2008-09-12 23:27 which is exactly the purpose of the accent on the n (accent on consonants is written as an i infront of a vowel - hence the dz "i" en' 2008-09-12 23:27 that's really dz'en' 2008-09-12 23:28 ah 2008-09-12 23:28 and dz is a single letter 2008-09-12 23:28 wow, more complex than say czech 2008-09-12 23:28 that just happens to be written with two 2008-09-12 23:28 since we ran out of latin characters 2008-09-12 23:29 hence dz rz sz cz ch should basically be treated as single letters 2008-09-12 23:29 and then there's sounds like drz brz and so on, which are almost like a single letter 2008-09-12 23:29 ugh 2008-09-12 23:29 with a trill? 2008-09-12 23:30 while you can theoretically read it as b-rz and d-rz, almost everybody pronounces is it quickly where it kind of melds into one 2008-09-12 23:30 and there's no 'r' in their ;-) 2008-09-12 23:30 since rz is actually 'z with a dot' ie. ż 2008-09-12 23:30 which is the first letter of my last name 2008-09-12 23:30 which is almost like the j in french 'je' 2008-09-12 23:30 so more like 'zh' 2008-09-12 23:31 IC 2008-09-12 23:31 it's actually very consistant 2008-09-12 23:31 very few words can't be read correctly if you know the relatively short set of rules 2008-09-12 23:31 there are very few exceptions 2008-09-12 23:31 I'll learn those rules 2008-09-12 23:31 over time 2008-09-12 23:32 slavic languages are fun 2008-09-12 23:32 great langues for being cynical in 2008-09-12 23:32 from what I have seen 2008-09-12 23:32 an example being frozen (zmarzniÄ™ty), where the syllable split is zmar-zniÄ™-ty [the Ä™ is nasal e, or e with a , ogonek accent - like french c with cedilla, but flipped other way] 2008-09-12 23:32 whoops 2008-09-12 23:32 none of those chars are working on braindead xchat 2008-09-12 23:32 so even though you see rz it ain't zh 2008-09-12 23:33 those were all the same character 2008-09-12 23:33 time for a less braindead chat client 2008-09-12 23:33 probably because I'm writing in utf8 2008-09-12 23:33 yes 2008-09-12 23:33 although they show up fine in pidgin 2008-09-12 23:33 but xchat should grok that 2008-09-12 23:33 xchat probably does 2008-09-12 23:33 xchat sucks at everything by default 2008-09-12 23:33 I'd guess your terminal is not utf8 or something 2008-09-12 23:33 I wouldn't assume taht 2008-09-12 23:33 it doesn't run in a terminal 2008-09-12 23:35 oh, it doesn't? 2008-09-12 23:35 oh right the x ;-) 2008-09-12 23:36 what are we on here? 2008-09-12 23:36 -!- maze_pallas(~elbereth@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:36 oftc 2008-09-12 23:36 let's see if my xchat works 2008-09-12 23:36 ąćęłńóśźż 2008-09-12 23:36 works 4 me 2008-09-12 23:36 ąćęłńóśźż 2008-09-12 23:36 yup 2008-09-12 23:37 ĄĆĘÅŃÓŚŹŻ 2008-09-12 23:37 xchat does utf8 great here 2008-09-12 23:37 ĄĆĘÅŃÓŚŹŻ 2008-09-12 23:37 well 2008-09-12 23:37 right 2008-09-12 23:37 I have to set something 2008-09-12 23:37 or upgrade 2008-09-12 23:37 but you probably need to have proper locale 2008-09-12 23:37 oh 2008-09-12 23:37 xchat_2.6.1-0ubuntu2_i386.deb here 2008-09-12 23:37 I have xchat 2.8.4 2008-09-12 23:37 freshly installed 2008-09-12 23:37 I thought unicode was supposed to be independent of locale 2008-09-12 23:38 -!- maze_(~maze@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:38 testing ;-) 2008-09-12 23:38 ?????ó??? 2008-09-12 23:38 nope 2008-09-12 23:38 2.6.8-0.3 2008-09-12 23:38 etch 2008-09-12 23:38 :p 2008-09-12 23:38 try LC_ALL=en_US.utf-8 xchat as the startup command 2008-09-12 23:39 -!- maze__(~maze@216-239-45-4.google.com) has joined #tux3 2008-09-12 23:39 testing 2008-09-12 23:39 ĄĆĘÅŃÓŚŹŻ ąćęłńóśźż 2008-09-12 23:39 yep works 2008-09-12 23:39 it's gotten crowded in here 2008-09-12 23:40 I guess I don't really know what that means... 2008-09-12 23:40 amazed at all the maze clones 2008-09-12 23:40 you better get home 2008-09-12 23:40 I need to hear about spore 2008-09-12 23:40 but obviously something is broken if xchat's wire encoding depends on the locale 2008-09-12 23:41 maybe there's a switch or something 2008-09-12 23:42 I'm giving up on getting xchat to show those chars 2008-09-12 23:42 ah no idea 2008-09-12 23:42 just quit 2008-09-12 23:42 one day I will do a gui for irssi 2008-09-12 23:42 and restart with LC_ALL=en_US.utf-8 xchat 2008-09-12 23:42 keep promising myself 2008-09-12 23:42 decloned. 2008-09-12 23:43 going to try the above? or shall I head home? 2008-09-12 23:44 head home 2008-09-12 23:44 ok will do so then 2008-09-12 23:44 I'll have it working when you get there 2008-09-12 23:44 and you can start your fs 2008-09-12 23:44 or your spore review 2008-09-12 23:45 pick up some ramyun on the way in case it turns into a long one 2008-09-12 23:45 ok, might do so, there's a store on route after all... 2008-09-12 23:48 hey 2008-09-12 23:48 hi 2008-09-12 23:48 yeah, just got some crude gdb script to scan through all system threads and print out their state 2008-09-12 23:48 more to come 2008-09-12 23:49 this is to help with core examinations 2008-09-12 23:49 which it seems that nobody does under Linux 2008-09-12 23:49 true 2008-09-12 23:49 I have never 2008-09-12 23:49 should 2008-09-12 23:50 it's not a criticism, just a new tool to help with debugging 2008-09-12 23:50 I'm getting a bunch of nfsd threads in state 4 which is a bit odd 2008-09-12 23:51 it's about time I learned summa that fu 2008-09-12 23:53 summa ? 2008-09-12 23:53 =some of 2008-09-12 23:53 ok 2008-09-12 23:53 yap 2008-09-12 23:53 that's all we did at netapp practically 2008-09-12 23:53 for better or worse 2008-09-12 23:53 gotta get me summa data 2008-09-12 23:53 gotta get me summa dat 2008-09-13 00:13 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-13 00:13 test 2008-09-13 00:20 it didn't work 2008-09-13 00:33 ok, one nasty little issue with putting the atom tables in the atomdict... ext2_create_entry likes to rely on the inode->i_size to know how many dirent blocks there are 2008-09-13 00:34 now the poor thing thinks there are an awful lot of them 2008-09-13 00:52 so this being inside the filesystem, I can actually let it have stuff out past the end of i_size 2008-09-13 00:53 and let the dirops happilly continue using i_size to know how many dir blocks there are 2008-09-13 00:54 that's probably ok 2008-09-13 00:54 got to think about it 2008-09-13 00:55 now this is where I'd like somebody to be awake ;) 2008-09-13 00:55 I guess I'll go take a run around the pure track 2008-09-13 00:55 see if that focusses me 2008-09-13 01:10 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-13 02:17 -!- Aks(~ankitsriv@123.237.71.198) has joined #tux3 2008-09-13 02:27 should have pinged if you needed someone awake 2008-09-13 02:28 maze, do you know what pread is supposed to do with read past end of file? 2008-09-13 02:29 I would think, return zero 2008-09-13 02:29 ah, I'll ptrace ;) 2008-09-13 02:29 brilliant 2008-09-13 02:29 you could just right a tiny program to test it 2008-09-13 02:30 my guess is pread should behave exactly like lseek read 2008-09-13 02:31 probably EINVAL 2008-09-13 02:31 but testing would see 2008-09-13 02:33 it's supposed to return zero 2008-09-13 02:34 and in fact something else is going on 2008-09-13 02:34 so never mind I'll sort it 2008-09-13 02:34 how was spore? 2008-09-13 02:34 haven't gotten to it yet 2008-09-13 02:34 how's the new fs? 2008-09-13 02:34 going through open tabs in firefox and closing them 2008-09-13 02:34 right 2008-09-13 02:34 lots of stuff from the week to catch up on before I reboot 2008-09-13 02:34 yah 2008-09-13 02:34 I'm about 30% through... 2008-09-13 02:35 that's why it's good that firefox crashes 2008-09-13 02:35 it does down, you lose 150 tabs, you find out they didn't matter, you get hours of your life back 2008-09-13 02:36 nope it restarts with all the tabs 2008-09-13 02:36 thankfully 2008-09-13 02:36 I actually always killall firefox-bin instead of closing it 2008-09-13 02:36 ok, got it sorted 2008-09-13 02:36 that way the tabs don't get lost ;-) 2008-09-13 02:36 so how is it? 2008-09-13 02:36 it's because I running 32 bit fileops 2008-09-13 02:36 shifted the block number, passed zero to pread 2008-09-13 02:37 everything makes sense 2008-09-13 02:37 well, It's a little tricky to test this high offset sparse file stuff 2008-09-13 02:38 I'll find a way around it 2008-09-13 02:38 it still seems like a good thing to do 2008-09-13 02:40 there we are, compiled with 64 bit r/w and got the proper error 2008-09-13 02:40 % 2008-09-13 02:40 5 2008-09-13 02:40 EIO 2008-09-13 02:48 hmm, I don't think there's really any need to test in 32bit userspace if the kernels 64 bit, right? 2008-09-13 02:48 all the conversions happen way earlier at the syscall entry point... 2008-09-13 02:49 all combinations need testing eventually 2008-09-13 02:49 but true, you can live blissfully in 64 bit 2008-09-13 02:49 never worry about 32 bit 2008-09-13 02:50 yes. all combos need testing, but much later 2008-09-13 02:50 right 2008-09-13 02:50 I don't think that's something that needs worrying in the dev stage 2008-09-13 02:50 I'll stay in 32 bit 2008-09-13 02:50 it's the most demanding 2008-09-13 02:50 32 bit kernel? 2008-09-13 02:50 yes 2008-09-13 02:50 ah 2008-09-13 02:50 shapor runs 64 bit 2008-09-13 02:50 most others do 2008-09-13 02:50 so do I 2008-09-13 02:50 but if it doesn't work on 32 bit it doesn't exist 2008-09-13 02:50 I actually run an interesting system 2008-09-13 02:51 45 bit fedora 8.5 2008-09-13 02:51 45? 2008-09-13 02:51 yeah, a joke 2008-09-13 02:51 haha 2008-09-13 02:51 I run a 23 db system 2008-09-13 02:51 it's a 32-bit fedora 8 system, with 64-bit kernel installed, some stuff to support that, and a 64-bit compiler, than upgraded to fedora 9, than more stuff upgraded to 64-bits 2008-09-13 02:51 that's the most important thing about it from my point of view 2008-09-13 02:51 wait, sorry 2008-09-13 02:52 it's 28 db 2008-09-13 02:52 hmm... 2008-09-13 02:52 let me see what it really is 2008-09-13 02:52 mine's a macbook pro laptop, so it's almost silent unless I run the procs full throttle 2008-09-13 02:52 I also have a headless (well with projector) box, which is also very quiet, since it's a shuttle 2008-09-13 02:53 anyway since my laptop is mostly 32 bits, I felt avg of 32 + 64 = 48 was not right 2008-09-13 02:53 so instead went with geometric mean sqrt(32 * 64) = sqrt(2) * 32 = 1.4 *32 = 32 + 12.8 = 44.8 ~ 45 2008-09-13 02:54 and that's why it's a 45 bit fedora 8.5 system 2008-09-13 02:54 brilliant logic, ain't it? 2008-09-13 02:55 right 2008-09-13 02:55 29 dba @ 1 meter 2008-09-13 02:55 ok night 2008-09-13 02:55 and it's still too noisy for me 2008-09-13 02:55 I want 27 now 2008-09-13 02:55 hard to get 2008-09-13 02:55 flips: good luck with design and coding as usual :) 2008-09-13 02:55 heh, that's why I use a laptop, and a headless box 2008-09-13 02:55 this is quieter than any laptop I've used 2008-09-13 02:55 the box can be on the other end of the room 2008-09-13 02:55 considerably 2008-09-13 02:56 really? even when you run with no cpu consumption on a laptop and spun down drives? 2008-09-13 02:56 spun down ok 2008-09-13 02:56 the hard drive is the noisiest component 2008-09-13 02:56 and I got the quiest ones on the market 2008-09-13 02:56 remember my laptop is basically a remote X/xterm/ssh server 2008-09-13 02:56 quietest 2008-09-13 02:56 sure 2008-09-13 02:57 but the drive doesn't stay spun down 2008-09-13 02:57 yeah, I actually use flash for some things 2008-09-13 02:57 in my experience 2008-09-13 02:57 and don't like things suddenly going "whirr" either ;-) 2008-09-13 02:57 quiet means, makes no noise 2008-09-13 02:57 like the root fs part that is read only (8gb), with tons of the non-permanent stuff living in tmpfs (although no swap) 2008-09-13 02:58 ok, you're hardcore 2008-09-13 02:58 I'd expect no less 2008-09-13 02:58 that way I have ro root fs with 8 gb, tmpfs with /var pieces 2008-09-13 02:58 my favorite box here is the fit pc 2008-09-13 02:58 no fan 2008-09-13 02:58 has a quiet 2.5 in drive 2008-09-13 02:58 ah 2008-09-13 02:58 which will be replaced with a 2.5 in flash drive 2008-09-13 02:58 pretty soon 2008-09-13 02:59 my daughter's favorite box too 2008-09-13 02:59 I actually have my flash drive in a raid array with an identical size partition on the hard disk 2008-09-13 02:59 has a fine linux distro on it 2008-09-13 02:59 didn't have to do a thing 2008-09-13 02:59 with the hard drive part set to raid mode 'write_mostly' 2008-09-13 02:59 that way if you pull the flash it falls back to the hard drive 2008-09-13 02:59 than you can remount,rw 2008-09-13 02:59 update the system 2008-09-13 02:59 ok, now I have this little logical problem 2008-09-13 02:59 remount,ro 2008-09-13 02:59 put the flash back 2008-09-13 02:59 in 2008-09-13 03:00 and then 8gbs sync to flash in one go 2008-09-13 03:00 - presto wear levelling solved even with ext3 ;-) 2008-09-13 03:00 cool 2008-09-13 03:00 that's md? 2008-09-13 03:00 must be 2008-09-13 03:00 dm can't sync ;-) 2008-09-13 03:00 once both are synced, there are no writes (read-only mount), and the hdd is write-mostly, so all reads hit swap 2008-09-13 03:00 yup it's md 2008-09-13 03:00 erm 2008-09-13 03:01 not swap - flash 2008-09-13 03:01 ddraid project is going to get going pretty soon 2008-09-13 03:01 cluster raid 2008-09-13 03:01 been gathering dust for some time 2008-09-13 03:01 nice and quiet - and fast - since seek time is awesome 2008-09-13 03:01 but it's going to be really useful 2008-09-13 03:01 and linear read is pretty much the same as a normal 2.5 inch drive 2008-09-13 03:01 (25mb/s) 2008-09-13 03:02 which drive is it? 2008-09-13 03:02 and since it's an expresscard 8gb flash - it doesn't stick out of the notebook - just sits nestled within the cavity 2008-09-13 03:02 ah 2008-09-13 03:02 uhm, some no name I picked up off of ebay for like 40 bucks a year and a half back 2008-09-13 03:02 seen em at fry's 2008-09-13 03:03 need 32 gb I think 2008-09-13 03:03 and with the drive spun down, there's less heat - thus less need for fans 2008-09-13 03:03 I don't really want to do work on a smaller one 2008-09-13 03:03 remember - this is just the OS 2008-09-13 03:03 data lives in the cloud 2008-09-13 03:03 in this case on the headless box 2008-09-13 03:04 nice little cloud 2008-09-13 03:04 a cumullo closetus 2008-09-13 03:04 or when at work... it lives elsewhere ;-) 2008-09-13 03:04 ok, my logistical problem 2008-09-13 03:04 and of course email and so on, are already in the cloud to begin with 2008-09-13 03:04 I'm testing this gigantic sparse file stuff 2008-09-13 03:04 and my linux oss can't do those big files 2008-09-13 03:04 my fs 2008-09-13 03:05 let me see 2008-09-13 03:05 why not? 2008-09-13 03:05 what kernel? 2008-09-13 03:05 oss? 2008-09-13 03:05 I'm writing stuff at 2^40 bytes out 2008-09-13 03:05 was that a typo for os? 2008-09-13 03:05 yes 2008-09-13 03:05 uhm, compiling 32-bit userspace with 32-bit kernel? 2008-09-13 03:05 so, ext3 just can't do that 2008-09-13 03:05 does 34 bits work? 2008-09-13 03:05 right, 32/32 2008-09-13 03:05 no 2008-09-13 03:06 2^40, see? 2008-09-13 03:06 tux3 is 2^48 2008-09-13 03:06 tux3 can do it 2008-09-13 03:06 asking whether 34 works 2008-09-13 03:06 even mapped into a loopback file 2008-09-13 03:06 34 doesn't 2008-09-13 03:06 um 2008-09-13 03:06 wait 2008-09-13 03:06 then your problem is compile options 2008-09-13 03:06 no, 34 should be ok 2008-09-13 03:06 #define USE_LARGEFILEOFFSET 64 or so 2008-09-13 03:06 yep, done 2008-09-13 03:07 the problem is the loopback file 2008-09-13 03:07 so 34 works? 2008-09-13 03:07 well 2008-09-13 03:07 34 is 16gb 2008-09-13 03:07 that should definitely work 2008-09-13 03:07 see 2^40 above 2008-09-13 03:07 33 is 8gb - that I know works - since that's dvd images ;-) 2008-09-13 03:07 well, that's why I'm asking if you've tested with 33 2008-09-13 03:07 terabyte 2008-09-13 03:07 no 2008-09-13 03:07 I guess I will 2008-09-13 03:07 then probably worth checking ;-) 2008-09-13 03:07 but I can't leave it that way 2008-09-13 03:08 not satisfactory 2008-09-13 03:08 if it doesn't work at 33, your problem is not in the kernel 2008-09-13 03:08 sure, I can do 15 minutes of testing with a much lower offset 2008-09-13 03:08 or half an hour 2008-09-13 03:08 but I need to write real code 2008-09-13 03:08 if it works at 33, but not at 40, then you've got a kernel internal problem 2008-09-13 03:08 it works fine 2008-09-13 03:08 like I said, this is just logical 2008-09-13 03:09 flips: btw, mmLinux was my first kernel project ever 2008-09-13 03:09 wait a minute 2008-09-13 03:09 just so that you know 2008-09-13 03:09 I think 2 TB is the biggest sparse file I can make on this system 2008-09-13 03:09 we're talking about the size of a file right? 2008-09-13 03:09 I figured it was either going to make me or kil me 2008-09-13 03:09 bh, looks like a fine project 2008-09-13 03:09 right 2008-09-13 03:09 I've done well with the rap that it's given me 2008-09-13 03:09 yes 2008-09-13 03:09 that makes sense it's probably 4GB * 512 or something 2008-09-13 03:09 I didn't know about it at all 2008-09-13 03:09 however, I think I can do better 2008-09-13 03:10 maze, exactly 2008-09-13 03:10 lame 2008-09-13 03:10 it's the blocks count in the ext3 inode 2008-09-13 03:10 http://en.wikipedia.org/wiki/Ext3 2008-09-13 03:10 yeah, I was in the middle of the entire -rt thing and I got forgotten about, dropped out of existence 2008-09-13 03:10 see file size limit = 2tb 2008-09-13 03:10 measured in sectors instead of blocks, lamest idea ever 2008-09-13 03:10 unfixable 2008-09-13 03:10 apparently 2008-09-13 03:11 use jfs 2008-09-13 03:11 ok, so the correct way to test this is to boot tux3 up far enough that I can do the testing in tux3 files 2008-09-13 03:11 lol 2008-09-13 03:11 which go up to 2^48 (true) 2008-09-13 03:11 I wasn't joking 2008-09-13 03:11 been doing that already 2008-09-13 03:11 jfs does 4pb which is 52 bits 2008-09-13 03:11 for a few weeks even 2008-09-13 03:11 ah 2008-09-13 03:12 that's another option 2008-09-13 03:12 but it would limit people's abiltiy to test 2008-09-13 03:12 or xfs 2008-09-13 03:12 8 exabyte limit 2008-09-13 03:12 which is 63 bits 2008-09-13 03:12 it's important that tux3 builds and tests on the lowest common denominator linux system 2008-09-13 03:12 can't require a specific host fs 2008-09-13 03:12 you can if you stick it in a loopback 2008-09-13 03:12 I am 2008-09-13 03:13 that get's me up to 2 TB on ext3 2008-09-13 03:13 then you merely need to format the loopback with jfs 2008-09-13 03:13 can't expect users to do that 2008-09-13 03:13 oh, sparse loopback 2008-09-13 03:13 I thought base fs -> file -> loopback -> jfs -> sparse file -> loopback -> tux3 2008-09-13 03:13 but that does get complex 2008-09-13 03:13 and probably blows the stack 2008-09-13 03:13 can't expect the user even to have jfs 2008-09-13 03:14 it's not compiled in by default 2008-09-13 03:14 well 2008-09-13 03:14 default on fedora 2008-09-13 03:14 probably comes in modules on most recent distros 2008-09-13 03:14 ok 2008-09-13 03:14 and ubuntu too, in a module I expect 2008-09-13 03:14 but still 2008-09-13 03:14 module of course 2008-09-13 03:14 then they have to mess with a fragile loopback 2008-09-13 03:14 it's pi o'clock 2008-09-13 03:14 and really tux3 can do that by itself 2008-09-13 03:14 heh 2008-09-13 03:14 so it is 2008-09-13 03:15 what are you plans for october 31? 2008-09-13 03:15 uhm 2008-09-13 03:15 it's a friday 2008-09-13 03:15 halloween 2008-09-13 03:15 looks like it's a friday 2008-09-13 03:16 looks like none at the moment 2008-09-13 03:16 I'm thinking of arranging an official cabal meeting that day 2008-09-13 03:16 were? 2008-09-13 03:16 just an idea 2008-09-13 03:16 rather: where? 2008-09-13 03:17 somewhere in the fear and loathing of LA 2008-09-13 03:17 ugh, would need to drive down... 2008-09-13 03:17 with a web presence 2008-09-13 03:17 unless we organized it somewhere mid-way 2008-09-13 03:17 possible 2008-09-13 03:17 like I say, just an idea at the moment 2008-09-13 03:18 bay area certainly could be good for attendance 2008-09-13 03:18 where are you? 2008-09-13 03:18 santa monica 2008-09-13 03:18 socal 2008-09-13 03:18 something like santa maria 2008-09-13 03:19 close 2008-09-13 03:19 or grover beach 2008-09-13 03:19 I'm sure those saints all live near each other 2008-09-13 03:19 that was a suggestion of a mid-way meet point 2008-09-13 03:19 right next to the beach 2008-09-13 03:19 indeed 2008-09-13 03:19 oh 2008-09-13 03:19 I see 2008-09-13 03:20 it's outside the LA basin 2008-09-13 03:20 outside of LA jams 2008-09-13 03:20 so you deal with LA 2008-09-13 03:20 we norcal folks get a larger distance to travel 2008-09-13 03:21 I'd probably take highway 1 and leave 3 hours early - since I love that drive... but oh well ;-) 2008-09-13 03:21 150 miles on pch... 2008-09-13 03:21 yes, it's not too far 2008-09-13 03:21 anyway, just an idea 2008-09-13 03:21 that's 130 miles from you, 230 from me 2008-09-13 03:22 ok I know what I'll do 2008-09-13 03:22 something around there, some restaurant? 2008-09-13 03:22 about my logistical problem 2008-09-13 03:22 I'll make the position of the atom refcount map a variable in the superblock 2008-09-13 03:22 and set it really low for unit testing 2008-09-13 03:22 duh 2008-09-13 03:23 yes, something like that 2008-09-13 03:23 130 miles I can handle 2008-09-13 03:23 you're younger, can handle further 2008-09-13 03:23 I need to check with such folks as natalie 2008-09-13 03:23 we could do this as an LA-only event 2008-09-13 03:23 or try for larger coverage 2008-09-13 03:24 where's she from? 2008-09-13 03:24 ukraine 2008-09-13 03:24 lives in LA 2008-09-13 03:24 I brought her into goog 2008-09-13 03:24 looks like SM 2008-09-13 03:24 goog's lucky about that 2008-09-13 03:24 most likely 2008-09-13 03:24 we can patch you in ;-) 2008-09-13 03:25 patch as in? 2008-09-13 03:25 remotey 2008-09-13 03:25 that's another option 2008-09-13 03:25 to do it with a web presence 2008-09-13 03:25 oh, as in organize it in the office? to vc? 2008-09-13 03:25 the idea is to have some, um, ethanol involved 2008-09-13 03:26 not quite 2008-09-13 03:26 let's have an email loop 2008-09-13 03:26 cause I very heavily doubt I have the net uplink for any decent vc at home 2008-09-13 03:26 about it 2008-09-13 03:26 it's just comsucktik 2008-09-13 03:26 yes 2008-09-13 03:26 well 2008-09-13 03:42 just a question: do you test tux3 on 64bit? 2008-09-13 03:44 because it seems that all I get are error messages :) 2008-09-13 03:45 or maybe I need a newer fuse-version. Which one are you using? 2008-09-13 03:45 2.7.3 here 2008-09-13 03:53 flips is on 2.7.4 2008-09-13 03:53 and shapor and I are on 64 bit 2008-09-13 03:53 fuse version is set to 27 2008-09-13 03:53 hmm... I thought at least creating files should work under fuse, shouldn't it? 2008-09-13 03:53 in the source 2008-09-13 03:54 it should 2008-09-13 03:54 post your error? 2008-09-13 03:54 data, you have a web paste utility you use? 2008-09-13 03:55 well, there are a few, but not a single one 2008-09-13 03:55 which do you prefer? 2008-09-13 03:55 any 2008-09-13 03:55 just paste your output there 2008-09-13 03:55 and let konrad go at it ;-) 2008-09-13 03:55 heh 2008-09-13 03:56 I think tux3fuse is the toy to use 2008-09-13 03:56 I can't see tux3fs as being useful anymore with the low level one there 2008-09-13 03:56 desktop test # touch test 2008-09-13 03:56 desktop test # ls 2008-09-13 03:56 desktop test # 2008-09-13 03:56 nothing shows up 2008-09-13 03:56 konrad, I'll leave that question to you and shapor 2008-09-13 03:56 first error. 2008-09-13 03:56 data: at least it doesn't crash :) 2008-09-13 03:56 I think you're right but I'm not the expert 2008-09-13 03:57 pff, I first looked at fuse the day I sent that email 2008-09-13 03:57 that's more than me 2008-09-13 03:57 still haven't looked at it 2008-09-13 03:57 heh 2008-09-13 03:57 -su: echo: write error: Transport endpoint is not connected 2008-09-13 03:57 atom refcounting getting closer 2008-09-13 03:58 echo "foo" > bar 2008-09-13 03:58 means fuse didn't start 2008-09-13 03:58 run with -f 2008-09-13 03:58 that is, make defuse 2008-09-13 03:58 right. using that 2008-09-13 03:59 and it hangs like it should? 2008-09-13 03:59 anyway, you want to paste all the output 2008-09-13 03:59 there should be lots 2008-09-13 03:59 hm, I'm not getting tux3fuse to mount anything here 2008-09-13 03:59 http://www.nomorepasting.com/getpaste.php?pasteid=20198 2008-09-13 04:00 when I do: echo "foo" > bar 2008-09-13 04:00 the steps are something like: dd if=/dev/zero of=./dev seek=100M count=1; ./tux3 mkfs ./dev; ./tux3fuse dev tmp/ 2008-09-13 04:00 yes? 2008-09-13 04:00 oh, segment fault 2008-09-13 04:01 I'm not even getting that 2008-09-13 04:01 i just used make defuse 2008-09-13 04:01 you want to find where the segfault is 2008-09-13 04:01 tux3_init: fdsize64 failed for 'dev' (Bad file descriptor)! 2008-09-13 04:01 konrad's on it ;) 2008-09-13 04:01 nah I got to get to sleep 2008-09-13 04:02 and i have to do more algebra (bah!) 2008-09-13 04:02 and I'm moving in ~5 days so I may be otherwise occupied come this tuesday evening 2008-09-13 04:02 data, try: sudo gdb -args ./tux3fuse /tmp/testdev /tmp/test -f 2008-09-13 04:02 slight variation 2008-09-13 04:03 run under gdb 2008-09-13 04:03 0x0000000000406239 in xcache_limit (xcache=0x0) at tux3.h:284 2008-09-13 04:03 284 return (void *)xcache + xcache->size; 2008-09-13 04:03 yup it's a bug 2008-09-13 04:04 shall we chase it tomorrow? 2008-09-13 04:04 it's the middle of the day? :P 2008-09-13 04:04 oh right 2008-09-13 04:04 well 2008-09-13 04:04 data: we're in PST, it's 4am for flips and I :D 2008-09-13 04:04 we need to get a tux3 debug center going over there in europe 2008-09-13 04:04 i'll have a look at it if I find the time 2008-09-13 04:04 ok 2008-09-13 04:04 good luck 2008-09-13 04:04 it's just a bug ;-) 2008-09-13 04:04 but otherwise I'll be around tomorrow 2008-09-13 04:05 now you can run under gdb, makes it easier to chase 2008-09-13 08:21 -!- pgquiles(~pgquiles@229.Red-83-49-101.dynamicIP.rima-tde.net) has joined #tux3 2008-09-13 10:10 -!- Aks(~ankitsriv@123.237.71.198) has left #tux3 2008-09-13 12:22 well, xattr get/set actually seem to be an interesting method for extended operations on inodes 2008-09-13 13:26 maze, how is the new fs going? 2008-09-13 13:26 writing the makefile... 2008-09-13 13:27 that's most of the work, if your write the fs in "make" and use fuse 2008-09-13 13:27 you didn't sleep much 2008-09-13 13:28 starting from the makefile ;-) 2008-09-13 13:28 want something that compiles 2008-09-13 13:28 and I'm writing straight in kernel-space 2008-09-13 13:28 hardcore 2008-09-13 13:28 I'd like to have a build-debug environment 2008-09-13 13:29 good exercise 2008-09-13 13:29 hey, I'm not doing this to test a concept of a fs 2008-09-13 13:29 but to learn the API 2008-09-13 13:29 I know 2008-09-13 13:29 I'm excited about that 2008-09-13 13:29 you're probably going to be telling me about it in a week 2008-09-13 13:29 things I didn't know and should have ;) 2008-09-13 13:30 one would only hope... 2008-09-13 13:30 but right know, I'm not even getting it to compile a module ;-) 2008-09-13 13:30 I'm also expecting to hear some swearing in the channel 2008-09-13 13:30 the secret, the way everbody starts a new fs: cut and paste ramfs 2008-09-13 13:31 even lazier people cut and paste tux2 2008-09-13 13:31 sorry 2008-09-13 13:31 ext2 2008-09-13 13:31 I'm lazy - but I think that would be counter productive 2008-09-13 13:31 I'm starting with a clean slate, with ramfs/tmpfs/ext2 as cut-n-paste sources 2008-09-13 13:31 but planning on writing it all 2008-09-13 13:32 I want to understand every line of code 2008-09-13 13:32 and the only way to do that is to write it yourself... 2008-09-13 13:32 well - I've got a working makefile. 2008-09-13 13:32 of course it currently doesn't build any modules... 2008-09-13 13:32 ugh 2008-09-13 13:32 and use jon corbet's examples 2008-09-13 13:32 so maybe the definition of working is more like ' it doesn't report parse errors' 2008-09-13 13:32 there is a particularly good example from linux device drivers on building a minimal module 2008-09-13 13:35 http://lwn.net/Articles/21817/ 2008-09-13 13:35 enjoy 2008-09-13 13:35 hmm 2008-09-13 13:35 this was around the time rusty fscked with the module system and messed it all up 2008-09-13 13:38 don't neglect to have a close look at my use_atom code I just posted to the list, they way it handles the positive and negative carries between shorts might be interesting to you 2008-09-13 13:38 a form of bit bashing you don't see much these days 2008-09-13 13:38 clumsy in c 2008-09-13 13:40 okay, have a junkfs.ko 2008-09-13 13:41 well it loads into running kernel (yay for testing on machine you're working on) 2008-09-13 13:41 and unloads 2008-09-13 13:41 of course all it has is empty init/exit 2008-09-13 13:42 [maze@nike junkfs]$ make clean 2008-09-13 13:42 rm -f *~ *.o *.ko *.mod.c .*.cmd 2008-09-13 13:42 rm -f modules.order .depend .version .*.o.flags .*.o.d 2008-09-13 13:42 rm -rf .tmp_versions 2008-09-13 13:42 rm -f Module.markers Module.symvers 2008-09-13 13:42 [maze@nike junkfs]$ make 2008-09-13 13:42 make -C /lib/modules/2.6.26.3-29.fc9.x86_64/build SUBDIRS=/home/maze/junkfs modules 2008-09-13 13:42 make[1]: Entering directory `/usr/src/kernels/2.6.26.3-29.fc9.x86_64' 2008-09-13 13:42 CC [M] /home/maze/junkfs/super.o 2008-09-13 13:42 LD [M] /home/maze/junkfs/junkfs.o 2008-09-13 13:42 Building modules, stage 2. 2008-09-13 13:42 MODPOST 1 modules 2008-09-13 13:42 CC /home/maze/junkfs/junkfs.mod.o 2008-09-13 13:42 LD [M] /home/maze/junkfs/junkfs.ko 2008-09-13 13:42 make[1]: Leaving directory `/usr/src/kernels/2.6.26.3-29.fc9.x86_64' 2008-09-13 13:42 (reverse-i-search)`modp': modprobe ath_pci 2008-09-13 13:42 [maze@nike junkfs]$ /sbin/lsmod | egrep ju 2008-09-13 13:42 [maze@nike junkfs]$ sudo /sbin/insmod ./junkfs.ko 2008-09-13 13:42 [maze@nike junkfs]$ /sbin/lsmod | egrep ju 2008-09-13 13:42 junkfs 9856 0 2008-09-13 13:42 [maze@nike junkfs]$ sudo /sbin/rmmod junkfs 2008-09-13 13:42 [maze@nike junkfs]$ /sbin/lsmod | egrep ju 2008-09-13 13:42 [maze@nike junkfs]$ 2008-09-13 13:50 cat /proc/filesystems | grep junk 2008-09-13 14:02 ok, atom reverse mapping then we are done with atoms for a while 2008-09-13 14:04 ok, printk debugging v0.1 ready 2008-09-13 14:05 moving to v0.2 2008-09-13 14:05 @/home/maze/junkfs/super.c:26 - Entering: init_junk_fs() 2008-09-13 14:05 @/home/maze/junkfs/super.c:27 - Exiting: init_junk_fs() 2008-09-13 14:05 @/home/maze/junkfs/super.c:32 - Entering: exit_junk_fs() 2008-09-13 14:05 @/home/maze/junkfs/super.c:33 - Exiting: exit_junk_fs() 2008-09-13 14:05 registered it yet? 2008-09-13 14:05 guess not 2008-09-13 14:05 or you would have grepped 2008-09-13 14:06 that would be v0.0.2 I think 2008-09-13 14:06 well 2008-09-13 14:06 that's just me ;) 2008-09-13 14:07 you're building in your home directory, most hacks build right in a kernel tree 2008-09-13 14:07 so you can git the whole tree 2008-09-13 14:08 I don't even have the source for the kernel ;-) 2008-09-13 14:08 it's just the normal fedora core 9 kernel from koji 2008-09-13 14:08 leet 2008-09-13 14:08 sploit time 2008-09-13 14:08 no, working on making it verbose 2008-09-13 14:08 that is in fact how I started tux2 2008-09-13 14:09 worked with modules up until I realized I was going to be bringing down my workstation a lot 2008-09-13 14:09 well, next step will be kvm 2008-09-13 14:09 you're moving along 2008-09-13 14:10 well, first still have to get debugging more verbose 2008-09-13 14:10 it's not dumping function entry or exit values 2008-09-13 14:10 ltt? 2008-09-13 14:10 and after all - the entire point of this exercise is to learn the api 2008-09-13 14:10 which means seeing what gets passed in 2008-09-13 14:10 (and out) 2008-09-13 14:11 plus it makes debugging easier 2008-09-13 14:17 ok, I have to "unbundle" the ext2 dirops so I can find out which block and offset it created a new dirent at 2008-09-13 14:17 easiest way is to return the dirent and buffer I guess 2008-09-13 14:17 and to be able to search a given dirent block 2008-09-13 14:17 probably how it should have been written in the first place 2008-09-13 14:25 Current debug output: 2008-09-13 14:25 @/home/maze/junkfs/super.c:44 - Entering: init_junk_fs() 2008-09-13 14:25 @/home/maze/junkfs/super.c:39 - Entering: test(5, 6) 2008-09-13 14:25 @/home/maze/junkfs/super.c:40 - Exiting: test(...) = 0 2008-09-13 14:25 @/home/maze/junkfs/super.c:46 - Exiting: init_junk_fs(...) = 0 2008-09-13 14:25 @/home/maze/junkfs/super.c:50 - Entering: exit_junk_fs() 2008-09-13 14:25 @/home/maze/junkfs/super.c:51 - Exiting: exit_junk_fs(...) 2008-09-13 14:26 here's the code: 2008-09-13 14:26 static int test (int a, int b) { 2008-09-13 14:26 <------>DBG_ENTER2(int,a,int,b); 2008-09-13 14:26 <------>DBG_RETURN1(int,0); 2008-09-13 14:26 } 2008-09-13 14:26 static int __init init_junk_fs(void) { 2008-09-13 14:26 <------>DBG_ENTER0(); 2008-09-13 14:26 <------>test(5, 6); 2008-09-13 14:26 <------>DBG_RETURN1(int,0); 2008-09-13 14:26 } 2008-09-13 14:26 static void __exit exit_junk_fs(void) { 2008-09-13 14:26 <------>DBG_ENTER0(); 2008-09-13 14:26 <------>DBG_RETURN0(); 2008-09-13 14:26 } 2008-09-13 14:26 what are those funny minuses? 2008-09-13 14:26 tabs? 2008-09-13 14:26 oh, that's tabs 2008-09-13 14:27 dark blue on darker blue background, but they show up fine after pasting 2008-09-13 14:27 you need spaces after your commas or my head will explode getting yucky goo everywhere 2008-09-13 14:28 ok, you're ready to register/unregister your fs 2008-09-13 14:28 right, also need to force func declare and debug into the same line I think, and dedup it 2008-09-13 14:28 will total about 6 lines 2008-09-13 14:28 right. 2008-09-13 14:28 plus a couple for a stub fill_super 2008-09-13 14:28 well, and the filesystem_type decl ;-) 2008-09-13 14:28 it starts to bloat 2008-09-13 14:28 of course ;-) 2008-09-13 14:29 I'd go with the separate code and debug lines 2008-09-13 14:31 so, the following isn't better? 2008-09-13 14:31 DECLARE0(static,int,__init,init_junk_fs) 2008-09-13 14:31 <------>test(5, 6); 2008-09-13 14:31 <------>DBG_RETURN1(int, 0); 2008-09-13 14:31 } 2008-09-13 14:31 I'm not sure myself 2008-09-13 14:31 seperate lines means it's easier to turn off on a per function basis 2008-09-13 14:32 makes my eyes bleed 2008-09-13 14:32 if not sure, go with the unbundled form 2008-09-13 14:32 oh right spaces ;-) 2008-09-13 14:32 right 2008-09-13 14:32 separate lines rules the world for debug traces 2008-09-13 14:33 got to remember, we're writing in C, don't try to make it pretty, you will not succeed, and if it doesn't look ugly then the fates will not smile upon you 2008-09-13 14:34 yeah, but for return unless I use a temp variable, I kind of have to return from within the macro 2008-09-13 14:34 This is what test looks like now: 2008-09-13 14:34 DECLARE2(static, int, , test, int, a, int, b) 2008-09-13 14:34 DBG_RETURN1(int, 0); 2008-09-13 14:34 } 2008-09-13 14:35 hmm, requires a little thinking, and I'm hungry 2008-09-13 14:35 try ltt 2008-09-13 14:35 it does this for you 2008-09-13 14:35 what's ltt? 2008-09-13 14:35 so you can concentrate on the problem 2008-09-13 14:35 linux trace toolkit 2008-09-13 14:36 yes, google found it 2008-09-13 14:37 ok, I have a better idea than shelling ext2_create_entry to return buffer and dirent... instead of an error code, return the dir file pos 2008-09-13 14:37 or -1 if there was an error 2008-09-13 14:37 the only error ext2_create_entry returns anyway is -EIO 2008-09-13 14:37 just more braindamaged C style error handling, or lack of it 2008-09-13 14:37 I'm not making it worse, honest 2008-09-13 14:46 oh, right this is C 2008-09-13 14:46 can't declare vars in the middle of func body 2008-09-13 14:46 you can 2008-09-13 14:46 well 2008-09-13 14:47 you have to override the kernel compile flags 2008-09-13 14:47 so you don't get warnings 2008-09-13 14:47 we might build tux3 that way for a while 2008-09-13 14:47 folks 2008-09-13 14:48 until the squacking from old schoolers gets too much to bear 2008-09-13 14:48 I can't remember what the reason for not using C++ was in the kernel? 2008-09-13 14:48 was it the programmers? 2008-09-13 14:49 or were there actual issues with the compiler 2008-09-13 14:49 fear of exceptions and crazy hidden semantics 2008-09-13 14:49 no real issues 2008-09-13 14:49 I know you _can_ compile non-C++-std-compliant C++ without any libraries 2008-09-13 14:49 on, no designated initializers 2008-09-13 14:49 that's a killer 2008-09-13 14:50 even c99 is permabanned 2008-09-13 14:50 I probably know the feature, but I'm not sure what that refers to, is that the .something = something struct initializer 2008-09-13 14:50 for no reason whatsoever 2008-09-13 14:50 or the [d] = something 2008-09-13 14:50 right 2008-09-13 14:50 essential 2008-09-13 14:50 both? 2008-09-13 14:50 agreed, they're essential 2008-09-13 14:50 the former mostly 2008-09-13 14:50 I have used the second exactly once, and that was last week 2008-09-13 14:50 in tux3 2008-09-13 14:51 fear of exceptions is a good one, but you should just not use them ;-) 2008-09-13 14:51 it's mostly fear of hidden behavior 2008-09-13 14:51 linus hates that 2008-09-13 14:51 I've never seen a c++ prog that didn't have it 2008-09-13 14:51 hidden behaviour... is the C++ compiler more loose? 2008-09-13 14:51 way loose 2008-09-13 14:52 code generated is beyond pathetic compared to hand crafted C 2008-09-13 14:52 you can also write hand crafted c++ of course but nobody does 2008-09-13 14:52 even if you don't use code-killing features? 2008-09-13 14:52 (like exceptions, multiple inheritance, large parts of OO, etc) 2008-09-13 14:52 linuxers don't have that discpline 2008-09-13 14:52 [templates...] 2008-09-13 14:53 remember, 90%+ of linux is dodgy drivers written by people who wish it was saturday 2008-09-13 14:53 ah, so it really boils down to programmers 2008-09-13 14:53 I wish it was saturday 2008-09-13 14:53 (and it is!) 2008-09-13 14:53 me too 2008-09-13 14:53 right 2008-09-13 14:53 it's nice to be happy :) 2008-09-13 14:53 speaking of which 2008-09-13 14:53 nice when wishes come true 2008-09-13 14:53 nearly sk8 oclock 2008-09-13 14:53 and I'm nearly done with the atom revmap 2008-09-13 14:53 woohoo 2008-09-13 14:54 I need to include linux/fs.h apparently ;-) 2008-09-13 14:56 the fun starts 2008-09-13 14:58 well registering was simple 2008-09-13 14:59 of course there's no get_sb function declared... 2008-09-13 14:59 naturally 2008-09-13 14:59 now you can see it in proc 2008-09-13 14:59 a lot of stuff is happening 2008-09-13 15:00 that's where you realize the vfs is actually oo, even if it was developed by folks with very little understanding of oo 2008-09-13 15:00 which includes me ;-) 2008-09-13 15:00 though to be sure my role in vfs devel was minor 2008-09-13 15:01 mainly just contributed the inode specialization model 2008-09-13 15:02 ok, atom reverse entries are being created 2008-09-13 15:02 now lets reverse an atom 2008-09-13 15:02 I suppose I could use the readdir interface for this 2008-09-13 15:03 that would be kind of perverse 2008-09-13 15:03 no, sorry, really perverse 2008-09-13 15:03 nodev junkfs 2008-09-13 15:03 I just won't do that 2008-09-13 15:03 good 2008-09-13 15:03 now to take a look at the flags 2008-09-13 15:04 the difference between a nodev and a dev is significant ;-) 2008-09-13 15:04 have fun will kill_litter_super 2008-09-13 15:04 with 2008-09-13 15:04 well, right now none are declared, allthough it is easy to make it dev 2008-09-13 15:04 I'd like to see what is available 2008-09-13 15:04 kinda easy 2008-09-13 15:04 and kinda not 2008-09-13 15:04 you will see 2008-09-13 15:04 it's not as crystalline as you think right now 2008-09-13 15:06 /* public flags for file_system_type */ 94#define FS_REQUIRES_DEV 1 95#define FS_BINARY_MOUNTDATA 2 96#define FS_HAS_SUBTYPE 4 97#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ 98#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() 99 * during rename() internally. 100 */ 2008-09-13 15:06 so requires dev, means a block device with the data, as opposed to just in mem 2008-09-13 15:06 I wonder if nfs needs dev 2008-09-13 15:06 probably not 2008-09-13 15:07 right 2008-09-13 15:07 nfs is nodev 2008-09-13 15:07 binary mount data is probably for nfs and smb because they use binary mount options and hence have special mount programs 2008-09-13 15:07 subtype - no idea 2008-09-13 15:07 good observation 2008-09-13 15:07 dev means "block dev" 2008-09-13 15:07 maybe something like fat 2008-09-13 15:07 is considered subtyped 2008-09-13 15:08 no idea either 2008-09-13 15:08 sounds like rot 2008-09-13 15:08 reval_dot seems especially useful for nfs, maybe others 2008-09-13 15:08 d_move - seems like something worth knowing 2008-09-13 15:08 although probably later on 2008-09-13 15:08 see, I never looked at all those flags 2008-09-13 15:08 worthwhile knowing there's an implementation option there 2008-09-13 15:08 they come and go 2008-09-13 15:08 not a stable api 2008-09-13 15:08 well, you have to develop for some api ;-) 2008-09-13 15:09 right 2008-09-13 15:09 internal kernel api is a moving target 2008-09-13 15:09 of course 2008-09-13 15:09 partly intentional to encourage out of tree people to merge 2008-09-13 15:09 partly to improve it 2008-09-13 15:10 probably worthwhile, although breakage for breakage sake should be frowned upon, if it's just okay to change stuff, but only for the 'better', than that's another issue 2008-09-13 15:11 ok, lets see which fs'es use which flags 2008-09-13 15:11 yeah, I'm needlessly thorough... but oh, well, can't change who and what I am 2008-09-13 15:11 that means I'll be able to ask you questions soon 2008-09-13 15:12 it's a feature, not a bug 2008-09-13 15:13 blockdev - tons, as expected - including nfsd (the server) 2008-09-13 15:13 although for nfsd it's actually checking you're exporting a fs with a dev backing 2008-09-13 15:13 wonder if that means you can't export ramfs 2008-09-13 15:14 oh because nfs uses the dev from cookie to lookup the fs 2008-09-13 15:15 ugh, broken 2008-09-13 15:15 [by design] 2008-09-13 15:16 although there's a hack to be able to re-export nfs mounts 2008-09-13 15:16 binary_mountdata 2008-09-13 15:16 we're going to have to copy this buffer and make it tux3 U #3 2008-09-13 15:17 coda, ncpfs (netware), nfs, smbfs/cifs 2008-09-13 15:17 so basically the complex net file systems 2008-09-13 15:17 all nodev? 2008-09-13 15:17 of course 2008-09-13 15:17 probably because they take so many options related to networking 2008-09-13 15:17 -> no binary_mountdata 2008-09-13 15:17 never looked at that 2008-09-13 15:18 the opposite of that is? 2008-09-13 15:18 subtype appears to be a fuse hack 2008-09-13 15:18 ah, right 2008-09-13 15:18 the opposite of binary_mountdata is not putting it in flags 2008-09-13 15:18 see my complaint about fuse on that topic, thursday 2008-09-13 15:18 all 'normal' filesystems use text string mount options 2008-09-13 15:18 wrong idea 2008-09-13 15:18 each fuse fs should get its own type 2008-09-13 15:18 not all of the "fuse" 2008-09-13 15:18 just wrong 2008-09-13 15:19 okay, skipping fuse parsing ;-) giving me a headache 2008-09-13 15:20 REVAL_DOT 2008-09-13 15:20 is nfs only 2008-09-13 15:20 related to parent directory entries of a path being able to go stale 2008-09-13 15:20 revalidate 2008-09-13 15:20 something which is related to nfs protocol borkenness 2008-09-13 15:20 yes 2008-09-13 15:20 subtle 2008-09-13 15:20 although maybe hard to fix in a new netfs 2008-09-13 15:20 no, not hard 2008-09-13 15:21 just has to be stateful 2008-09-13 15:21 right 2008-09-13 15:21 stateless is unworkable braindamage 2008-09-13 15:21 but you want it both stateful, and stateless 2008-09-13 15:21 a pox upon us 2008-09-13 15:21 I don't agree 2008-09-13 15:21 lightweight state 2008-09-13 15:21 that scales 2008-09-13 15:21 is good 2008-09-13 15:21 nfs is bad 2008-09-13 15:21 doesn't work properly 2008-09-13 15:21 you want lightweight statefull with fallback to stateless 2008-09-13 15:21 trond will disagree of course 2008-09-13 15:21 the fallback being mostly for the server reboot/failover case 2008-09-13 15:22 you never want stateless 2008-09-13 15:22 stateless == brainless 2008-09-13 15:22 :-) 2008-09-13 15:22 well, would have to think about it more... stateless has nice features that you do want 2008-09-13 15:22 a nematode is getting close to stateless 2008-09-13 15:22 metanode? 2008-09-13 15:23 heh 2008-09-13 15:23 is that an anagram? 2008-09-13 15:23 wow 2008-09-13 15:23 so which is it? 2008-09-13 15:23 don't start with puns now ;-) 2008-09-13 15:23 nematode: disgusting little worm 2008-09-13 15:23 right, but is there an fs concept called nematode? 2008-09-13 15:23 nfs: disgusting little hack that grew up into a huge disgusting little worm 2008-09-13 15:24 no 2008-09-13 15:24 just me dissing nfs 2008-09-13 15:24 wish trond were here ;-) 2008-09-13 15:24 so you just mistyped metanode? or you meant the worm 2008-09-13 15:24 seem - I'm clueless 2008-09-13 15:24 sarcasm/irony/human interaction just fly right over me 2008-09-13 15:24 no, I meant to type nematode, I was comparing nfs to a nematode 2008-09-13 15:24 s/seem/see/ 2008-09-13 15:25 both are nearly stateless 2008-09-13 15:25 wait a minute - nfs is stateless... sin't it? 2008-09-13 15:25 not quite 2008-09-13 15:25 lockd implements a stateful protocol 2008-09-13 15:25 it's fakery to pretend it doesn't 2008-09-13 15:25 right - those are extensions 2008-09-13 15:25 although 2008-09-13 15:26 to be fair running with out it doesn't happen 2008-09-13 15:26 also tcp 2008-09-13 15:26 can't really be separated 2008-09-13 15:26 not really 2008-09-13 15:30 nfs is actually like 4 fs'es 2008-09-13 15:30 2 being v3 vs v4 2008-09-13 15:30 and 2 being normal vs cross-device registration hackery 2008-09-13 15:30 so you have 2 * 2 = 4 2008-09-13 15:30 right 2008-09-13 15:30 anyway 2008-09-13 15:30 D_MOVE 2008-09-13 15:30 it's rather cleverly and lazily compressed into fairly small source 2008-09-13 15:30 in linux 2008-09-13 15:30 apparently used by 2008-09-13 15:31 nfs and ocfs2 2008-09-13 15:31 probably related to directory deletions in some way 2008-09-13 15:31 what does it do? 2008-09-13 15:31 so much hackery in linux is because of nfs 2008-09-13 15:31 we'd be way better off if it had never been written 2008-09-13 15:32 well, some people make a living from it 2008-09-13 15:32 so they are ok 2008-09-13 15:32 and they are generally good to drink with 2008-09-13 15:32 especially good to drink with 2008-09-13 15:32 I think there must be a connection 2008-09-13 15:32 nope renames 2008-09-13 15:33 right, dentry move 2008-09-13 15:33 actually I dimly recall that 2008-09-13 15:33 so basically this is something along the lines of support for atomic renames 2008-09-13 15:33 a big wart in dentry cache 2008-09-13 15:33 and somehow nfs and ocfs2 are special 2008-09-13 15:33 I wonder why ocfs2 needs it 2008-09-13 15:34 can ask mark fasheh about that 2008-09-13 15:34 FS will handle d_move() 99 * during rename() internally. 2008-09-13 15:34 from the header file for that #define 2008-09-13 15:34 okay, looks like at first glance (as expected) we just need a backing blockdev 2008-09-13 15:34 I strongly suspect it was to solve a locking bottleneck in ocfs2 2008-09-13 15:35 but worth knowing for future design that there are such hacks 2008-09-13 15:35 for rename and for stale . and .. 2008-09-13 15:36 yes 2008-09-13 15:36 probably want to avoid it, but... 2008-09-13 15:36 what's ocfs2? 2008-09-13 15:36 nice little cluster filesystem from oracle 2008-09-13 15:36 quite underrated 2008-09-13 15:37 question about filenames 2008-09-13 15:37 does the vfs layer enforce, no nulls and no slashes in a file name? 2008-09-13 15:37 but otherwise anything goes? 2008-09-13 15:38 okay nodev is out 2008-09-13 15:39 next step - what the hell is get_sb ;-) 2008-09-13 15:39 main: >>> found unatom entry 12 for atom 1 2008-09-13 15:40 now to print out the name 2008-09-13 15:41 main: found unatom entry 0 for atom 0 2008-09-13 15:41 main: found unatom entry 12 for atom 1 2008-09-13 15:41 main: found unatom entry 24 for atom 2 2008-09-13 15:41 main: found unatom entry 0 for atom 3 2008-09-13 15:41 main: found unatom entry 0 for atom 4 2008-09-13 15:41 etc 2008-09-13 15:42 well, it should not be 0 for unknown atoms 2008-09-13 15:42 probably 2008-09-13 15:42 oh 2008-09-13 15:42 sure it should 2008-09-13 15:42 unused entry in the unatom table 2008-09-13 15:46 unused entry in the unatom table 2008-09-13 15:46 whoops 2008-09-13 15:47 main: found unatom entry 0 for atom 0 2008-09-13 15:47 0xb7d10400: 00 00 00 00 0c 00 03 00 66 6f 6f dd 01 00 00 00 "........foo....." 2008-09-13 15:47 there we go 2008-09-13 15:47 reversed 2008-09-13 15:47 time to skate 2008-09-13 16:01 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has joined #tux3 2008-09-13 16:02 6,800 lines 2008-09-13 16:02 only added about 300 including xattr support and atom refcounting 2008-09-13 16:02 xattrs will come in at around 500 lines total and be perfectly usuable 2008-09-13 16:02 superior maybe 2008-09-13 16:03 sk8 oclock 2008-09-13 16:03 really 2008-09-13 16:09 made further improvements to debugging: 2008-09-13 16:09 @/home/maze/junkfs/super.c:41 - Entering: init_junk_fs() 2008-09-13 16:09 @/home/maze/junkfs/super.c:26 - Entering: test(a=(int)5, b=(int)6) 2008-09-13 16:09 @/home/maze/junkfs/super.c:27 - Returning: test(...) = a + b = (int)11 2008-09-13 16:09 @/home/maze/junkfs/super.c:46 - Mark in init_junk_fs(...) err=(int)0 2008-09-13 16:09 @/home/maze/junkfs/super.c:49 - Returning: init_junk_fs(...) = 0 = (int)0 2008-09-13 16:09 @/home/maze/junkfs/super.c:55 - Entering: exit_junk_fs() 2008-09-13 16:10 @/home/maze/junkfs/super.c:57 - Returning: exit_junk_fs(...) = void 2008-09-13 16:10 I have to admit, your plan to write a fs to learn vfs is working out well 2008-09-13 16:10 spaces around the = please ;-) 2008-09-13 16:11 hmm, but those are like colons or something 2008-09-13 16:11 then colon with one space after 2008-09-13 16:11 lindent 2008-09-13 16:11 might as well get used to it 2008-09-13 16:11 well, this is text output from dmesg 2008-09-13 16:12 but yeah ': ' is probably better 2008-09-13 16:12 still 2008-09-13 16:12 it may well escape to lkml one day 2008-09-13 16:12 who knows 2008-09-13 16:12 @/home/maze/junkfs/super.c:26 - Entering: test(a: (int)5, b: (int)6) 2008-09-13 16:12 @/home/maze/junkfs/super.c:27 - Returning: test(...) = a + b = (int)11 2008-09-13 16:12 @/home/maze/junkfs/super.c:46 - Mark in init_junk_fs(...) err: (int)0 2008-09-13 16:13 does look better 2008-09-13 16:13 printk bytes are cheap ;-) 2008-09-13 16:13 yes 2008-09-13 16:13 easier on my eyes 2008-09-13 16:13 pleasant even 2008-09-13 16:13 changed already 2008-09-13 16:13 I noticed 2008-09-13 16:13 it was after all a 2 byte change 2008-09-13 16:13 changed, tested and pasted into the cloud 2008-09-13 16:13 that's the spirit 2008-09-13 16:13 I should probably setup a repository for this junkfs 2008-09-13 16:14 and work on getting a kvm debug working 2008-09-13 16:14 probably don't want to muck around with the get_sb stuff on my live box 2008-09-13 16:14 also have to figure out how to get compile junk to go elsewhere 2008-09-13 16:14 than the source dir 2008-09-13 16:16 yes 2008-09-13 16:17 cd your/source 2008-09-13 16:17 hg init 2008-09-13 16:17 hg add . 2008-09-13 16:17 hg commit 2008-09-13 16:17 that's all there is to it 2008-09-13 16:17 hg is mercurial? 2008-09-13 16:17 yes 2008-09-13 16:18 probably need to install it first then 2008-09-13 16:19 I'm really pleased with the xattr atom stuff 2008-09-13 16:19 awesome 2008-09-13 16:19 need to use some slight imagination to see how it will perform with a little cache in front of it, and to see the impact of atomic update/log rollup 2008-09-13 16:19 but otherwise I guess it's done 2008-09-13 16:19 some fiddling 2008-09-13 16:20 no more questions about potential lurking complexity 2008-09-13 16:20 and whether it can emulate straight ascii strings 2008-09-13 16:20 I don't think we need a option, really 2008-09-13 16:21 cool 2008-09-13 16:21 one thing missing: find a free atom 2008-09-13 16:21 to use 2008-09-13 16:21 instead of bindly generating new ones, need to code that 2008-09-13 16:22 the plan is to just let the thing expand up to some size, count the deletions in it, then when deletions/size exceeds a threshold, we rescan for deleted entries 2008-09-13 16:22 deleted atoms 2008-09-13 16:22 probably overkill 2008-09-13 16:22 an alternative is to put a linked list of free atoms in the unatom table 2008-09-13 16:22 better 2008-09-13 16:23 oh shiny - finaly rhel4.7 is out in centos 4.7 2008-09-13 16:23 yup free atom list is always better 2008-09-13 16:24 it shall be so 2008-09-13 16:24 will code that when I get back 2008-09-13 16:24 also need to code the atom table dump 2008-09-13 16:25 so some more fiddling until I can escape to more interesting things 2008-09-13 16:32 -!- caoliver(~oliver@75-134-208-20.dhcp.trcy.mi.charter.com) has left #tux3 2008-09-13 17:27 one slight drawback to my atable design I just noticed 2008-09-13 17:27 putting the tables up so high will make the radix tree quite deep 2008-09-13 17:27 I think 2008-09-13 17:27 so when I map at block 2^28 2008-09-13 17:28 radix tree has 2^5 fanout 2008-09-13 17:28 that is 6 radix tree levels 2008-09-13 17:28 probably nothing to worry about 2008-09-13 17:28 we zip through those very fast 2008-09-13 17:28 and with the hash in front of it, the overhead will disappear in the noise, if it was not already 2008-09-13 17:29 against that, we have the pleasing property of only having to sync one file to sync the entire atable including recounts and reverse map 2008-09-13 20:25 -!- tim_dimm(~mobile@32.156.233.244) has joined #tux3 2008-09-13 20:27 -!- tim_dimm(~mobile@32.156.233.244) has joined #tux3 2008-09-13 20:28 Howdy 2008-09-13 20:29 Got a geeky irc app for my phone 2008-09-13 20:29 :-) 2008-09-13 20:32 -!- tim_dimm(~mobile@32.156.233.244) has joined #tux3 2008-09-13 22:15 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-13 22:15 wb maze 2008-09-13 22:15 hey 2008-09-13 22:15 show_freeatoms: next = dead000000000666 2008-09-13 22:16 linked list 2008-09-13 22:16 nontrivial proposition when it's linked through disk blocks 2008-09-13 22:16 see the cute magic number 2008-09-13 22:16 for deleted atom 2008-09-13 22:16 I've a question about scheduling priorities... [yes, cute indeed] 2008-09-13 22:17 I'm not much of a scheduler person but fire away 2008-09-13 22:17 so, if we have a pre-emptible kernel, and one low-priority (ie. niced) user process does something which results in a call to the fs code, what priority does that code run within the kernel? 2008-09-13 22:18 same 2008-09-13 22:18 niced 2008-09-13 22:18 will it also be effectively scheduled as a niced task? giving way to other threads of execution of higher priority? 2008-09-13 22:18 yes 2008-09-13 22:18 so does linux then automatically boost thread execution priority within the kernel if a low-prio thread is blocking a higher-prio thread by holding a lock? 2008-09-13 22:19 [and, can thread priority be manually temporarily increase/decreased/changed within kernel code - for whatever reason] 2008-09-13 22:20 there's some priority inheritance stuff, yes, but I'm not familiar with it 2008-09-13 22:20 ie. how does the linux kernel deal with std priority inversion jazz 2008-09-13 22:20 you can do whatever you want in kernel 2008-09-13 22:20 ahh 2008-09-13 22:20 including changing priority 2008-09-13 22:20 of your task or any other 2008-09-13 22:21 you can also fill the entire kernel with zero ;-) 2008-09-13 22:21 true - good point 2008-09-13 22:21 although I was trying to read the bootid uuid from my module, and that actually turns out to be very non-trivial 2008-09-13 22:21 didn't say anything was easy 2008-09-13 22:21 almost nothing is 2008-09-13 22:21 but you can do it 2008-09-13 22:21 since it's not exported, and I don't see a good way to grab sysctl's from within the kernel ;-) and the interfaces are always user-oriented 2008-09-13 22:22 [obviously here easiest solution is to fix random.c to export the boot_id... but that's not something that can be done in a module] 2008-09-13 22:22 lyou'll get frustrated about what is not exported, until you realize... just export it 2008-09-13 22:22 right 2008-09-13 22:22 if it's a stupid idea you'll find out soon enough 2008-09-13 22:22 it just won't compile on 'older' kernels then 2008-09-13 22:23 or get past linus usually 2008-09-13 22:23 following linked lists is always scary 2008-09-13 22:23 never feels like it's going to terminate 2008-09-13 22:23 it did: 2008-09-13 22:23 show_freeatoms: next = dead000000000666 2008-09-13 22:23 show_freeatoms: next = dead000000000000 2008-09-13 22:24 this time 2008-09-13 22:24 well there's basically a static char[16] with the bootid, it's got links to it, but... ugh 2008-09-13 22:24 notice the frist atom to die was number 0 2008-09-13 22:24 so 0 is a valid atom 2008-09-13 22:24 probably going to regret that 2008-09-13 22:24 yup ;-) 2008-09-13 22:24 I always make 0 invalid 2008-09-13 22:24 or free 2008-09-13 22:24 or something 2008-09-13 22:24 well 2008-09-13 22:25 don't make it invalid just because you're lame ;) 2008-09-13 22:25 make it invalid because you have a good reason 2008-09-13 22:25 I don't have a good reason for atom zero yet 2008-09-13 22:25 but there likely is one 2008-09-13 22:25 good reason: it's easier on the eyes when you later debug it 2008-09-13 22:25 you never expect a 0 value to actually be pointing/referencing to something 2008-09-13 22:25 the magic number there makes things pretty unambiguous 2008-09-13 22:26 actually with the magic number - sure 2008-09-13 22:26 depends 2008-09-13 22:26 it's without that I'd be worried 2008-09-13 22:26 zero is often a valid offset 2008-09-13 22:26 ie. before it's dead 2008-09-13 22:26 it is a valid dirent offset in ext2 for example 2008-09-13 22:26 yes, but offset is a seperate matter 2008-09-13 22:26 well 2008-09-13 22:26 they're all offsets 2008-09-13 22:26 ok, right it needs some deeper thinko 2008-09-13 22:26 there is no such thing as an absolute address any more ;) 2008-09-13 22:27 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-13 22:27 hi daddy_dimm 2008-09-13 22:27 chatting on your iphone? 2008-09-13 22:27 yup 2008-09-13 22:27 leet 2008-09-13 22:27 ulberleet 2008-09-13 22:27 uber even 2008-09-13 22:27 makin' sure it came through 2008-09-13 22:27 it did 2008-09-13 22:27 k, pissin off the wife now 2008-09-13 22:27 then you had to change a diaper or something 2008-09-13 22:27 I should go 2008-09-13 22:27 heh 2008-09-13 22:28 short leash 2008-09-13 22:28 it was ever thus 2008-09-13 22:28 it was your idea ;) 2008-09-13 22:28 got 2500 to change minumim if I do my part 2008-09-13 22:28 k 2008-09-13 22:28 yammer at ya later 2008-09-13 22:28 later 2008-09-13 22:28 toodles 2008-09-13 22:29 what's up with zumastor? 2008-09-13 22:29 ok, now I just need to allocate from that list, then I'm done for the night 2008-09-13 22:29 again, nontrivial 2008-09-13 22:29 when the list is linked through file blocks 2008-09-13 22:29 entirely different scale of hacking than in memory 2008-09-13 22:31 I see I have some buffer leaks to chase 2008-09-13 22:31 so... it's going to be a while 2008-09-13 22:31 before I can rest 2008-09-13 22:35 -!- stargazr5(~gauravstt@59.95.17.142) has joined #tux3 2008-09-13 22:38 http://lxr.linux.no/linux+v2.6.26.5/drivers/md/md.c#L595 2008-09-13 22:38 what the hell is that code doing - and why? 2008-09-13 22:38 isn't that spurious? 2008-09-13 22:38 never mind 2008-09-13 22:38 dealing with carry 2008-09-13 22:39 two iterations, yes 2008-09-13 22:39 still looks crappy 2008-09-13 22:41 it's the sort off stuff that is cleaner in assembler 2008-09-13 22:42 adc %ah,%al; adc 0,%al - or whatever the proper registers are called nowadays 2008-09-13 22:42 uhm, first one add, second adc 2008-09-13 22:43 and that's merely 16 bit -> 8 bit, not 32->16 2008-09-13 22:43 much cleaner 2008-09-13 22:46 okay, trying to read in a superblock now, using bios 2008-09-13 22:56 brave 2008-09-13 22:57 submit_bio()... then what? 2008-09-13 23:00 glacing through other fs'es 2008-09-13 23:00 and other kernel subsystem 2008-09-13 23:00 the submit code in swap.c seems promising 2008-09-13 23:00 also check block_read_full_page 2008-09-13 23:00 and friends 2008-09-13 23:01 you need to set up an endio that unlocks something 2008-09-13 23:01 wakes up your process typically 2008-09-13 23:01 and you have to remember what you are supposed to wake up somehow 2008-09-13 23:01 in the private field of the bio 2008-09-13 23:01 you will typically stick some state struct in there 2008-09-13 23:01 this is working on the metal 2008-09-13 23:02 :-) cool. 2008-09-13 23:02 you could try the prepare_to_sleep etc api here 2008-09-13 23:02 submit, then sleep 2008-09-13 23:03 see, that's why people tend to use submit_bh, because then you can do wait_on_buffer 2008-09-13 23:03 but it's a very crufty path 2008-09-13 23:08 http://lxr.linux.no/linux+v2.6.26.5/mm/page_io.c#L25 2008-09-13 23:08 that looks like it might be a decent example of asynch bio handling 2008-09-13 23:09 pretty good 2008-09-13 23:10 end page writeback will do a bunch of stuff you don't need 2008-09-13 23:11 but writebackk will only happen if I dirty the page right? 2008-09-13 23:11 actually, scratch that 2008-09-13 23:11 I'm still not quite sure, whether this interface is read/write or mmap or both 2008-09-13 23:11 you're in complete control 2008-09-13 23:12 when you go submit_bio, stuff starts to happen 2008-09-13 23:12 but you will not be able to use these functions directly 2008-09-13 23:12 just use as a guid to write your own 2008-09-13 23:12 right 2008-09-13 23:12 somebody ought to make a simple "read that into this page" 2008-09-13 23:12 can pages be shared between kernel and userspace? 2008-09-13 23:12 based on this 2008-09-13 23:12 but nobody has that I know 2008-09-13 23:13 only by mapping into a page table 2008-09-13 23:13 ie. both the kernel and userspace have a part of disk mmap'ed into phys memory? 2008-09-13 23:13 and whichever edits, the other sees? 2008-09-13 23:13 yes 2008-09-13 23:13 by doing the mapping through the pagetable? 2008-09-13 23:13 done all the time 2008-09-13 23:13 yes 2008-09-13 23:14 I'm taking baby steps here ;-) 2008-09-13 23:14 you have to ask the right way, get it setup correctly 2008-09-13 23:14 so that it will be recovered properly when your process exits 2008-09-13 23:14 and so on 2008-09-13 23:15 it's a big topic 2008-09-13 23:15 we're going to do "read" on tuesday 2008-09-13 23:16 that in itself is a big topic 2008-09-14 02:04 ACTION is back from (goth club) Sabbat 2008-09-14 02:04 flips: oct 31th is halloween btw 2008-09-14 02:04 you might like to reschedule 2008-09-14 02:04 of course 2008-09-14 02:04 that was the point 2008-09-14 02:05 ACTION chuckles 2008-09-14 02:05 you going trick or treating? 2008-09-14 02:05 ok, now I would like to know your rationale for this 2008-09-14 02:05 probably neither, I don't live in an area that's family friendly 2008-09-14 02:06 be back in a bit 2008-09-14 02:06 I applaud you for your sense of humor, but I'm left a bit hanging as to what you're intending 2008-09-14 02:06 ACTION is freaking drunk now 2008-09-14 02:06 drunk IRCing 2008-09-14 02:06 I'll be sober in about 1/2 hour or so 2008-09-14 02:06 ok 2008-09-14 02:08 ACTION heads to get late night food 2008-09-14 02:08 flips: btw, I live right behind a big goth night/club in San Diego, Sabbat 2008-09-14 02:09 you can get a listing of clubs from socalgoth (southern cal goth) 2008-09-14 02:09 which has unified LA through SD listing 2008-09-14 02:09 for goth/industrial events 2008-09-14 04:13 -!- trymeeeee(~zxcvbnm@123.236.188.107) has joined #tux3 2008-09-14 05:28 -!- Aks(~ankitsriv@123.237.71.198) has joined #tux3 2008-09-14 05:34 -!- Aks(~ankitsriv@123.237.71.198) has left #tux3 2008-09-14 11:05 -!- pgquiles(~pgquiles@50.Red-79-153-248.staticIP.rima-tde.net) has joined #tux3 2008-09-14 11:13 -!- stargazr5(~gauravstt@59.95.30.8) has joined #tux3 2008-09-14 11:51 -!- Kirantpatil(~kiran@122.167.212.171) has joined #tux3 2008-09-14 11:51 -!- Kirantpatil(~kiran@122.167.212.171) has left #tux3 2008-09-14 11:54 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-14 13:06 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-14 16:04 ACTION starts a new edition of the Tux3 Report 2008-09-14 16:05 "Xattrs and Atoms" 2008-09-14 16:05 nice to have this as a fait accompli 2008-09-14 16:05 almsot 2008-09-14 16:05 just have to set the layout fields for the real filesystem and do some full system testing 2008-09-14 16:38 sk8 oclock 2008-09-14 17:17 hey 2008-09-14 19:46 flips: check my repo 2008-09-14 19:46 fix for inode->xcache leak 2008-09-14 19:59 will do 2008-09-14 20:00 pulls very slowly when there's a simultaneous kernel download in progress 2008-09-14 20:00 got to get me sum more a that bandwidth 2008-09-14 20:02 shapor, what's that (int)strlen(name) for? 2008-09-14 20:13 hey, should I do the tux3 kernel part with git or mercurial? 2008-09-14 20:14 maybe I should as on the mercurial channel 2008-09-14 20:16 hey shapor 2008-09-14 20:22 it'll eventually need to be git in the kernel 2008-09-14 20:27 %.*s expects int not size_t 2008-09-14 20:28 flips: ^ 2008-09-14 20:28 konrad, not really 2008-09-14 20:28 there are some mercurial projects for kernel things, for example btrfs 2008-09-14 20:28 hm, ok 2008-09-14 20:29 it's rather stupid for strlen to return size_t 2008-09-14 20:29 indeed 2008-09-14 20:29 kinda makes you want to reimplement it, doesnt it? 2008-09-14 20:29 like anybody should scan that much ascii text looking for a crappy null byte 2008-09-14 20:29 in asm ;) 2008-09-14 20:30 your basic 5 byte assembly program 2008-09-14 20:30 12 if you can a really fancy fast one 2008-09-14 20:30 scasb makes it easy doesnt it? 2008-09-14 20:30 scasb is slow on a lot of procs 2008-09-14 20:30 have'nt been keeping up with the latest 2008-09-14 20:30 hmm 2008-09-14 20:31 but a simple look using basic register instructions is fastest today 2008-09-14 20:31 let the superscaler logic do its thing 2008-09-14 20:31 and the shadow registers 2008-09-14 20:31 simple loop 2008-09-14 20:31 anyway, we're git 2008-09-14 20:31 I just checked it in 2008-09-14 20:31 tux3 stub kernel fs is landing tonight 2008-09-14 20:32 shapor, anyway we have %t 2008-09-14 20:32 that's for this braindamage I think 2008-09-14 20:32 wll 2008-09-14 20:32 doesn't work for %.*s 2008-09-14 20:32 yuck 2008-09-14 20:33 stupid ancient unix gods 2008-09-14 20:34 heh 2008-09-14 20:38 there we go 2008-09-14 20:41 typical linux: CONFIG_MMU means "CONFIG_NOMMU" 2008-09-14 20:42 tux3 will not support nommu for now 2008-09-14 20:42 if somebody wants that they can pay for it 2008-09-14 20:43 ah, ramfs has an actual application 2008-09-14 20:43 it implements rootfs 2008-09-14 20:43 that's why it got a little bloaty 2008-09-14 20:43 lately 2008-09-14 20:47 * Tux3 Versioning Filesystem 2008-09-14 20:47 * 2008-09-14 20:47 * Portions Copyright (C) 2000 Linus Torvalds, 2000 Transmeta Corp. 2008-09-14 20:47 * Licensed under the GPL v2 2008-09-14 20:47 */ 2008-09-14 20:47 well 2008-09-14 20:47 what about the other (c) 2008-09-14 20:48 * Tux3 Versioning Filesystem 2008-09-14 20:48 * 2008-09-14 20:48 * Copyright (c) 2008, Daniel Phillips 2008-09-14 20:48 * Portions Copyright (C) 2000 Linus Torvalds, 2000 Transmeta Corp. 2008-09-14 20:48 * Licensed under the GPL v2 2008-09-14 20:48 */ 2008-09-14 20:48 there we go 2008-09-14 20:50 one little c one C 2008-09-14 20:52 hrm there is a still one leak in the inode test 2008-09-14 20:52 ==15560== 8,160 (8,040 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 7 2008-09-14 20:52 ==15560== at 0x4A1B858: malloc (vg_replace_malloc.c:149) 2008-09-14 20:52 ==15560== by 0x401DC2: new_map (buffer.c:442) 2008-09-14 20:52 ==15560== by 0x40988C: new_inode (inode.c:111) 2008-09-14 20:52 ==15560== by 0x40AD46: make_tux3 (inode.c:476) 2008-09-14 20:52 linus wrote the big C 2008-09-14 20:52 ==15560== by 0x40B17B: main (inode.c:530) 2008-09-14 20:53 I'm not correcting his typos 2008-09-14 20:53 I treat his copyright notice as (c) linus 2008-09-14 20:53 well 2008-09-14 20:53 it does look stupid 2008-09-14 20:53 there we go, changed to (c), I'm a flagrant copyright scofflaw 2008-09-14 20:53 arrest me 2008-09-14 20:54 i do not expect that leak to last long 2008-09-14 20:59 config TUX3_FS 2008-09-14 20:59 tristate "Tux3 Versioning Filesystem" 2008-09-14 20:59 help 2008-09-14 20:59 To compile this file system support as a module, choose M here: the 2008-09-14 20:59 module will be called tux3. 2008-09-14 20:59 If unsure, say Maybe. 2008-09-14 20:59 hrm its only the map in the sb inode 2008-09-14 21:00 in make_tux3 2008-09-14 21:00 seems odd since free_inode does indeed free the map unless its null 2008-09-14 21:00 the way those initializers work is dodgy 2008-09-14 21:00 structure assignments 2008-09-14 21:01 combined with desginated init = brainmuck 2008-09-14 21:01 probably should do it all with mallocs 2008-09-14 21:01 the fs init that is 2008-09-14 21:01 the reason for the cute little minimal struct decs is getting old 2008-09-14 21:38 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-14 21:39 hi 2008-09-14 21:40 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-14 21:41 cat /proc/filesystems 2008-09-14 21:41 nodev sysfs 2008-09-14 21:41 nodev rootfs 2008-09-14 21:41 nodev bdev 2008-09-14 21:41 nodev proc 2008-09-14 21:41 nodev sockfs 2008-09-14 21:41 nodev pipefs 2008-09-14 21:41 nodev anon_inodefs 2008-09-14 21:41 nodev tmpfs 2008-09-14 21:41 nodev inotifyfs 2008-09-14 21:41 nodev devpts 2008-09-14 21:41 reiserfs 2008-09-14 21:41 ext3 2008-09-14 21:41 ext2 2008-09-14 21:41 nodev tux3 2008-09-14 21:41 nodev ramfs 2008-09-14 21:41 nodev hostfs 2008-09-14 21:41 nodev mqueu 2008-09-14 21:41 let's get rid of some useless ones 2008-09-14 21:42 anon_inodefs <- :p 2008-09-14 21:42 what is that? 2008-09-14 21:42 crap 2008-09-14 21:42 haven't looked at it 2008-09-14 21:42 but I can tell from the name 2008-09-14 21:42 few other dodgy looking ones 2008-09-14 21:43 now, are job is to get rid of the nodev on tux3 2008-09-14 21:43 let's try to mount 2008-09-14 21:45 root@deep:~# mount -t tux3 tux3 /mnt 2008-09-14 21:45 root@deep:~# echo hello >/mnt/foo 2008-09-14 21:45 root@deep:~# cat /mnt/foo 2008-09-14 21:45 hello 2008-09-14 21:46 root@deep:~# mount 2008-09-14 21:46 /dev/ubda on / type ext2 (rw) 2008-09-14 21:46 proc on /proc type proc (rw) 2008-09-14 21:46 devpts on /dev/pts type devpts (rw,gid=5,mode=620) 2008-09-14 21:46 tux3 on /mnt type tux3 (rw) 2008-09-14 21:46 ok, time to check it in 2008-09-14 22:00 http://phunq.net/ddtree 2008-09-14 22:00 http://phunq.net/ddtree?p=tux3fs;a=summary 2008-09-14 22:01 just for now 2008-09-14 22:06 git... it's actually pretty bad 2008-09-14 22:06 compared to mercurial 2008-09-14 22:06 user unfriendly 2008-09-14 22:07 does not do what you expect 2008-09-14 22:18 you need to do commit -a 2008-09-14 22:18 to get what mercurial does for just commit 2008-09-14 22:19 and what any rational person would want 2008-09-14 22:22 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-14 22:34 nice 2008-09-14 22:42 http://shapor.com/tux3/ updated 2008-09-14 22:50 :) 2008-09-14 22:51 shapor, when's the next round of updates on the design doc? 2008-09-14 22:56 i was just thinking about that 2008-09-14 23:39 -!- nataliep_(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-14 23:45 so, fuse's pkg-config wants -D_FILE_OFFSET_BITS and so does everything else, or else off_t will be wrong (current just diskio has it) 2008-09-14 23:45 the only problem is, if I put -D_FILE_OFFSET_BITS on everything, then it shoes up twice in the fuse compile 2008-09-14 23:45 esthetically irritating 2008-09-14 23:46 well, it's just going to be that way 2008-09-14 23:46 and our build will start to suck, just like every build 2008-09-15 01:17 hey 2008-09-15 01:22 hi 2008-09-15 01:22 new lkml post just when out 2008-09-15 01:23 "Tux3 Report: What next?" 2008-09-15 01:28 http://lkml.org/lkml/2008/9/15/23 2008-09-15 01:52 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-09-15 02:10 just read the post 2008-09-15 02:10 good advertizement :) 2008-09-15 02:11 that's the idea 2008-09-15 02:12 ok night :) 2008-09-15 02:45 http://www.letterp.com/~dbg/practical-file-system-design.pdf <- a book on filesystem design 2008-09-15 02:45 I should read it 2008-09-15 02:45 learn something 2008-09-15 03:06 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-15 03:56 flips: git has another level between the repository and your checkout. that's why you need to do the commit -a 2008-09-15 03:56 you can just alias it though 2008-09-15 04:47 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-15 07:02 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-15 07:48 -!- Kirantpatil(~kiran@122.167.194.220) has joined #tux3 2008-09-15 07:48 -!- Kirantpatil(~kiran@122.167.194.220) has left #tux3 2008-09-15 08:08 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-15 08:08 100 2008-09-15 08:08 100! 2008-09-15 08:09 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-15 09:21 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-15 09:46 -!- Kirantpatil(~kiran@122.167.211.98) has joined #tux3 2008-09-15 09:46 -!- Kirantpatil(~kiran@122.167.211.98) has left #tux3 2008-09-15 10:13 -!- Kirantpatil(~kiran@122.167.211.98) has joined #tux3 2008-09-15 10:13 -!- Kirantpatil(~kiran@122.167.211.98) has left #tux3 2008-09-15 11:48 data, the thing is, it is not clear that git needs that extra level 2008-09-15 11:49 in fact, hg makes it clear that it doesn't 2008-09-15 12:23 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-15 12:23 folks 2008-09-15 12:37 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-15 12:40 hi bh 2008-09-15 12:40 hey maze 2008-09-15 12:40 see, today there is a tux3 filesystem in kernel 2008-09-15 12:40 that's the good news, the bad news is it's really just ramfs 2008-09-15 12:54 url to the post ? 2008-09-15 12:55 http://lkml.org/lkml/2008/9/15/23 2008-09-15 12:55 flips: tux3.org seems... slooow 2008-09-15 12:56 shapor, true 2008-09-15 12:56 don't know why 2008-09-15 12:56 getting lots of traffic? 2008-09-15 12:56 let's see what traffic I've got 2008-09-15 12:56 probably not 2008-09-15 12:56 haven't been monitoring 2008-09-15 12:56 could just be crappy linux vm stepping on itself 2008-09-15 12:56 means: close firefox 2008-09-15 12:57 perhaps git tree getting crawled 2008-09-15 12:57 googlebot is hammering me 2008-09-15 12:57 really hammering 2008-09-15 12:58 fsck 2008-09-15 12:58 DoS 2008-09-15 12:58 oh god 2008-09-15 12:58 it's indexing my git tree 2008-09-15 12:58 stupid, stupid bot 2008-09-15 12:58 /kickban googlebot 2008-09-15 12:58 thats easy 2008-09-15 12:58 robots.txt 2008-09-15 12:59 suggestions? 2008-09-15 12:59 where do I put it? 2008-09-15 12:59 http://en.wikipedia.org/wiki/Robots.txt 2008-09-15 12:59 put it in / 2008-09-15 12:59 User-agent: * 2008-09-15 12:59 Crawl-delay: 10 2008-09-15 12:59 damn I thought I could use the shapedia 2008-09-15 13:00 that will make it wait 10 seconds between requests to your box 2008-09-15 13:00 oh good I can 2008-09-15 13:00 I only want it to stay out of the git tree 2008-09-15 13:00 anything else it's welcome to index 2008-09-15 13:00 you can put in a disallow line then 2008-09-15 13:00 why not let it crawl the git tree with a 10 sec delay ? 2008-09-15 13:00 you can disallow like 2008-09-15 13:01 Disallow: /ddtree/ 2008-09-15 13:01 just think how many millions of dollars worth of storage git + lxr consume in goog datacenters 2008-09-15 13:01 so 2008-09-15 13:01 User-agent: * 2008-09-15 13:01 Disallow: /ddtree/ 2008-09-15 13:01 add some intelligence to googlebot, then donate 1/2 the savings to oss projects 2008-09-15 13:01 hrm although 2008-09-15 13:02 maybe you dont want the trailing slash 2008-09-15 13:02 in fact i dont think you do 2008-09-15 13:02 ACTION puts that in the suggestion box 2008-09-15 13:09 you should just disallow areas not meant to be indexed ;-) 2008-09-15 13:10 crawl-delay has a tendency to me of little actual use 2008-09-15 13:10 s/me/be/ 2008-09-15 13:10 MaZe: it should at least spread out the pain ;) 2008-09-15 13:15 mayhaps 2008-09-15 15:19 daniel@moonbase:/src/2.6.26.5.tux3$ git add robots.txt 2008-09-15 15:19 daniel@moonbase:/src/2.6.26.5.tux3$ git add fs/robots.txt 2008-09-15 15:19 daniel@moonbase:/src/2.6.26.5.tux3$ git diff 2008-09-15 15:19 daniel@moonbase:/src/2.6.26.5.tux3$ git diff -a 2008-09-15 15:19 no output 2008-09-15 15:19 fscking git 2008-09-15 15:19 I know there is a way, but how about it should just work 2008-09-15 15:19 good example of why we should mainly work with mercurial 2008-09-15 15:23 git diff HEAD 2008-09-15 15:24 true 2008-09-15 15:24 or git diff WANK 2008-09-15 15:24 ;-) 2008-09-15 15:24 thanks 2008-09-15 15:24 I like git ;-) 2008-09-15 15:24 I do too 2008-09-15 15:24 but not nearly as much as mercurial 2008-09-15 15:25 I'm using git for tinyos... what would be the reasons to switch to mercurial? :D 2008-09-15 15:29 RazvanM, mercurial makes it much easier for new contributers to get up to speed 2008-09-15 15:29 and doesn't get in the way of git experts 2008-09-15 15:30 basically, you just don't type the options that always seemed a little odd 2008-09-15 15:30 my ramp up for mercurial after git was about, um, 10 minutes 2008-09-15 15:30 same for Shapor I think 2008-09-15 15:30 Git took "some getting used to" 2008-09-15 15:30 hit more bugs in git than mercurial too 2008-09-15 15:30 and some things that should be bugs 2008-09-15 15:31 but are instead treated as features 2008-09-15 15:32 :D 2008-09-15 15:32 I definitely agree that it took me some time to get used to git 2008-09-15 15:33 for the dealing with changes for tinyos was very good for me though 2008-09-15 15:33 anyway as you can see I'm being even handed and using both 2008-09-15 15:33 puts me in a better position to complain about git ;) 2008-09-15 15:33 that is true ;-) 2008-09-15 15:33 tried mercurial? 2008-09-15 15:33 well you must 2008-09-15 15:33 if you have tux3 checked out 2008-09-15 15:34 for tux3 I had a chance to play with it 2008-09-15 15:34 it's just amazing how it seems to do the right thing by default 2008-09-15 15:34 the only whine I've had about it is, there isn't a simple command to delete a head 2008-09-15 15:34 there are a couple of longish commands available 2008-09-15 15:35 I did some changes to make it run on mac but then the new updates failed so I removed the files :P 2008-09-15 15:35 sorry 2008-09-15 15:35 I'll try to make fewer changes ;) 2008-09-15 15:35 hehe 2008-09-15 15:35 not your fault 2008-09-15 15:35 see if you can come up with some guidlines to make that smoother 2008-09-15 15:35 stuff I can do to make life easier for you 2008-09-15 15:37 I hope I'll find some time to try to make it work on mac 2008-09-15 15:37 these days you could probably make it work on a cell phone 2008-09-15 15:37 certainly a Nokia 800 series 2008-09-15 15:37 one thing that I did was to change mode_t to tux3_mode_t to avoid the collisions with the system's one 2008-09-15 15:37 :D :D :D 2008-09-15 15:38 want to send a patch or should I just do that? 2008-09-15 15:38 I'll do that 2008-09-15 15:38 what does it collide with? 2008-09-15 15:38 some libc thing? 2008-09-15 15:38 I think so 2008-09-15 15:38 could you find out? 2008-09-15 15:38 I need to reboot to see if my kernel panic is still in 10.5.5 2008-09-15 15:38 I'll be back 2008-09-15 15:39 (with the answer about mode_t :D) 2008-09-15 15:39 tux3 doesn't have a mode_t 2008-09-15 15:40 it uses the libc mode_t 2008-09-15 15:40 man 2 stat 2008-09-15 15:41 oops he's gone 2008-09-15 15:41 understandably 2008-09-15 15:47 i don't like the fact hg is python 2008-09-15 15:48 I don't care much 2008-09-15 15:48 I don't like the fact that guido doesn't support native code gen for python 2008-09-15 15:48 iow, guido is more of a problem than matt 2008-09-15 16:04 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-15 16:05 wb 2008-09-15 16:05 damn... they didn't fix my kernel panic :( 2008-09-15 16:05 my robots.txt fu is insufficient 2008-09-15 16:05 help me ;) 2008-09-15 16:05 who didn't? 2008-09-15 16:05 apple :P 2008-09-15 16:06 they just released 10.5.5 2008-09-15 16:06 how yahoobot it beating me up 2008-09-15 16:06 now 2008-09-15 16:06 ah 2008-09-15 16:06 linux seldom panics 2008-09-15 16:06 maybe apple will see the light 2008-09-15 16:06 about the mote_t: ./stdlib.h:typedef __darwin_mode_t mode_t; 2008-09-15 16:07 in my case it panics with a one line C program :P 2008-09-15 16:07 so... 2008-09-15 16:07 in exaclty what way is it incompatible? 2008-09-15 16:08 tux3 was also typedef-ing mode_t I think... 2008-09-15 16:08 perhaps it doesn't anymore :D 2008-09-15 16:10 it does not from what I see... 2008-09-15 16:12 I need to go back on some graphs for a paper. I'll try to make an attempt to get tux3 to compile on mac later tonight. 2008-09-15 16:17 cu 2008-09-15 16:17 tux3 u tomorrow ;) 2008-09-15 16:18 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-15 16:19 typedef __uint16_t __darwin_mode_t; 2008-09-15 16:19 lamerz 2008-09-15 16:22 robots.txt standard is really lame 2008-09-15 16:22 only works in the root of the server 2008-09-15 16:22 "standard" :p 2008-09-15 16:23 yep 2008-09-15 16:23 tux3 uses mode_t in just one place: fuse 2008-09-15 16:23 does fuse work on mac? 2008-09-15 16:23 so send your complaint to our fuser department ;) 2008-09-15 16:23 I wonder 2008-09-15 16:24 would be weird hmm? 2008-09-15 16:24 mhm 2008-09-15 16:24 but fuse won't compile by default 2008-09-15 16:24 the pkg-config will just not fire 2008-09-15 16:24 I think 2008-09-15 16:24 ifeq ($(shell pkg-config fuse && echo found), found) 2008-09-15 16:24 binaries += tux3fs tux3fuse 2008-09-15 16:24 endif 2008-09-15 16:24 right 2008-09-15 16:24 so if there's a mode_t problem, it's not in the tux3 source 2008-09-15 16:25 nice piece of make scripting by the way 2008-09-15 16:25 who did that again? 2008-09-15 16:25 RazvanM maybe 2008-09-15 16:37 moonbase:/var/www# cat robots.txt 2008-09-15 16:37 User-agent: * 2008-09-15 16:37 Disallow: / 2008-09-15 16:38 I'll refine that later 2008-09-15 16:38 needless to say, googlebot is still bothering me 2008-09-15 16:38 flips: you know robots.txt does not work for rogue bots, do you? 2008-09-15 16:39 "don't take no for an answer" -- googlebot 2008-09-15 16:39 pgquiles, if I see a rogue bot I know what to do 2008-09-15 16:39 I can say that googlebot is very impolite 2008-09-15 16:39 :-) 2008-09-15 16:39 has no concept of staying within a reasonable share of bandwidth 2008-09-15 16:40 obviously, googlebot is coded and maintained by "smart people" ;) 2008-09-15 16:40 :-D 2008-09-15 16:48 oh, msnbot has joined the party 2008-09-15 16:48 everybody knows where there's a good party it seems 2008-09-15 16:49 now, I want to explain to them: crawl my site, just don't index every version of linus's git tree 2008-09-15 16:49 please 2008-09-15 16:57 tell someone at google about git 2008-09-15 17:01 flips: with a robots.txt like that it is likely you won't show up in search results 2008-09-15 17:02 I know 2008-09-15 17:02 I needed some peace 2008-09-15 17:02 while I write the real file 2008-09-15 17:02 Disallow: /ddtree didn't work ? 2008-09-15 17:02 i guess you dont really care about phunq.net anyway 2008-09-15 17:02 let's try it 2008-09-15 17:03 from my access logs it looks like most bots only grab robots.txt before a scan 2008-09-15 17:03 not before earch request 2008-09-15 17:03 so if they have already started crawling it might just keep continuing 2008-09-15 17:03 i dunno 2008-09-15 17:04 I know 2008-09-15 17:04 that's impolite 2008-09-15 17:04 also for your disallow i think you need to give full paths 2008-09-15 17:04 so 2008-09-15 17:04 Disallow: fifo.c 2008-09-15 17:04 specially for one like googlebot with zigabucks spent on it 2008-09-15 17:04 won't do anything 2008-09-15 17:04 true 2008-09-15 17:04 and it has to be in the root 2008-09-15 17:04 all of which is work 2008-09-15 17:04 if you run apache you could add a disallow for googlebot in the .htaccess 2008-09-15 17:04 which isn't pleasant when my system is getting hammered 2008-09-15 17:04 based on user agent 2008-09-15 17:05 does it work for nonhtml files? 2008-09-15 17:05 I guess 2008-09-15 17:05 yes 2008-09-15 17:05 but you have to name the bot 2008-09-15 17:05 right? 2008-09-15 17:05 yeah 2008-09-15 17:05 funny how these standards don't get improved 2008-09-15 17:05 intrenched 2008-09-15 17:05 hard to change 2008-09-15 17:06 also no interest from goog in making it easy to exclude the bot, it's all or nothing, or pain 2008-09-15 17:06 just look for +http:// in the user agent 2008-09-15 17:06 same with the others of course 2008-09-15 17:06 i mean cmon we use http + javascript for "pushing" data to a browser 2008-09-15 17:06 semi-polite bots all have that 2008-09-15 17:06 whcih is just a polling pull loop 2008-09-15 17:06 its all crap 2008-09-15 17:07 but xml makes it better ;) 2008-09-15 17:07 ACTION pukes a little in his mouth 2008-09-15 17:07 flips: .htpasswd? 2008-09-15 17:07 for the moment no robot's are hitting me 2008-09-15 17:07 pgquiles_, but the site needs to be public 2008-09-15 17:08 moonbase:/var/www# cat robots.txt 2008-09-15 17:08 User-agent: * 2008-09-15 17:08 Disallow: /ddtree 2008-09-15 17:08 ok? 2008-09-15 17:08 seems reasonable 2008-09-15 17:09 I think the bots are staying away until their next sniff cycle 2008-09-15 17:09 http://www.whitehouse.gov/robots.txt 2008-09-15 17:09 because of the / 2008-09-15 17:09 interesting 2008-09-15 17:09 from before 2008-09-15 17:09 thats the first result for robots.txt disallow on google 2008-09-15 17:10 that's the one site that should be completely indexed 2008-09-15 17:10 without option 2008-09-15 17:10 heh 2008-09-15 17:10 the only url it is pulling is /ddtree 2008-09-15 17:10 User-agent: whsearch 2008-09-15 17:10 everything else are just paremeters being passed to it 2008-09-15 17:11 so i think /ddtree is sufficient 2008-09-15 17:11 can search an extra dozen things 2008-09-15 17:11 ok 2008-09-15 17:11 later I will refine that 2008-09-15 17:11 so it allows searching the tux3 part of ddtree 2008-09-15 17:11 and only that 2008-09-15 17:27 it's time to do extents 2008-09-15 17:27 never mind I claimed I'd do versioning next on lkml 2008-09-15 17:27 extends = benchmarkability 2008-09-15 17:27 easy 2008-09-15 17:28 and that big zfs bully won't kick sand in the face of skinny little tux3 any more 2008-09-15 17:28 tux3 will learn ju-extent-fu 2008-09-15 17:29 well where do we start 2008-09-15 17:29 the hardest part is actually versioned extents, so it's convenient that's the part getting deferred 2008-09-15 17:29 -!- pgquiles__(~pgquiles@50.Red-79-153-248.staticIP.rima-tde.net) has joined #tux3 2008-09-15 17:30 we need to be able to alloc an extent for one thing 2008-09-15 17:30 so the bitmap scanning gets fancier 2008-09-15 17:30 not just searching for a bit any more, but a contiguous run of bits 2008-09-15 17:30 and sometimes it might be best for it to say: here's the longest run I found in that region 2008-09-15 17:31 instead of just failing because it didn't find the length asked 2008-09-15 17:31 gets into heuristics 2008-09-15 17:31 then what else 2008-09-15 17:32 extents only appear in dleaf 2008-09-15 17:32 so... 2008-09-15 17:32 caller of deaf methods has impact 2008-09-15 17:32 those are in btree.c and inode.c 2008-09-15 17:32 I wonder if there is any btree.c impact 2008-09-15 17:32 there should not be 2008-09-15 17:33 actually, btree.c never even knows its calling dleaf methods 2008-09-15 17:33 so I guess the entire impact is in inode.c 2008-09-15 17:33 maybe 2008-09-15 17:33 and balloc.c 2008-09-15 17:33 big and 2008-09-15 17:34 yeah, so that makes them easy right? :P 2008-09-15 17:34 relatively 2008-09-15 17:34 compard to versioned extents 2008-09-15 17:34 should just make them versioned to begin with 2008-09-15 17:34 but still one of the messier bits so far 2008-09-15 17:34 my head hurts just thinking about it 2008-09-15 17:34 defer 2008-09-15 17:34 haha 2008-09-15 17:34 gain experience 2008-09-15 17:34 with the simpler case 2008-09-15 17:34 that is like, at the core of tux3 though 2008-09-15 17:35 oh yeah 2008-09-15 17:35 heart and soul 2008-09-15 17:35 but winning benchmarks is too 2008-09-15 17:35 and I can smell blood ;) 2008-09-15 17:35 without versioning, boo 2008-09-15 17:35 :P 2008-09-15 17:35 doesn't bother me 2008-09-15 17:35 yeah 2008-09-15 17:35 walk first 2008-09-15 17:35 if versioning arrives a month later 2008-09-15 17:35 run later 2008-09-15 17:35 then jump 2008-09-15 17:35 then arabesque 2008-09-15 17:35 teleport 2008-09-15 17:36 right 2008-09-15 17:36 then stretch space and time so travel isn't necessary anymore 2008-09-15 17:36 that's called multicore 2008-09-15 17:36 heh 2008-09-15 17:36 or is it qubits? 2008-09-15 17:36 bubytes would rule 2008-09-15 17:36 drugs i think 2008-09-15 17:36 qubytes 2008-09-15 17:37 that too 2008-09-15 17:37 just imagine the great hack 2008-09-15 17:37 "in xanadu did kubla geek a stately filesystem arch decree" 2008-09-15 17:38 -- anon 2008-09-15 17:42 ok, I"m going to get another eee 2008-09-15 17:42 seems one isn't enough 2008-09-15 17:42 wow your original tux3 announcement is still in the hottest threads on lkml 2008-09-15 17:42 the wife likes it ;) 2008-09-15 17:42 so it is, pushed down a little by the time travel 2008-09-15 17:43 probably a plot by linus to get his spot back 2008-09-15 17:43 and who is frans pop, is that a real name? ;) <- jk 2008-09-15 17:44 wow, what next is in the top 2008-09-15 17:44 right after -rc6 and time travel 2008-09-15 17:46 getting close to sk8 oclock 2008-09-15 17:53 ACTION flips_rollin 2008-09-15 19:43 flips: http://milek.blogspot.com/2008/03/zfs-de-duplication.html 2008-09-15 19:46 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-15 20:02 -!- tux3bot(~tux3bot@yzf.shapor.com) has joined #tux3 2008-09-15 20:03 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-15 20:05 -!- Kirantpatil(~kiran@122.167.199.254) has joined #tux3 2008-09-15 20:05 -!- Kirantpatil(~kiran@122.167.199.254) has left #tux3 2008-09-15 20:06 -!- ChanServ changed mode/#tux3 -> -o hirofumi 2008-09-15 20:06 -!- ChanServ changed topic to "Tux3 list membership just hit 100! ~ http://tux3.org" 2008-09-15 21:04 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-09-15 21:04 back 2008-09-15 21:04 that was a bit wierd 2008-09-15 21:07 flips: what do you think about the link ? 2008-09-15 21:25 -!- Aks(~ankitsriv@59.90.32.1) has joined #tux3 2008-09-15 22:13 -!- Aks(~ankitsriv@59.90.32.1) has left #tux3 2008-09-15 23:49 flips: you there ? 2008-09-16 01:29 `now 2008-09-16 01:30 autokilled... graphic 2008-09-16 01:32 march, it's old 2008-09-16 01:33 truth is, I fully trust in our students to out deduplicate any weeny engineers 2008-09-16 01:35 "what next" is #5 on lkml.org 2008-09-16 01:35 #6 2008-09-16 01:41 hehe 2008-09-16 01:43 hey maze 2008-09-16 01:44 hey 2008-09-16 01:44 we need to get your bio transfer working 2008-09-16 01:44 20 lines or less is defined as "working" 2008-09-16 01:44 ;-) 2008-09-16 01:44 just write a simple endio 2008-09-16 01:44 yeah, I've mostly been reading docs all sunday 2008-09-16 01:44 and browsing the source code 2008-09-16 01:45 put your submitter into sleep on a wait queue 2008-09-16 01:45 that's it 2008-09-16 01:45 one line wait queue declaration 2008-09-16 01:45 I could probably write something that works now, but I need to setup a better (less likely to crash machine) debug scenario than insmod into running machine 2008-09-16 01:45 naw 2008-09-16 01:45 and now it's unfortunately the work week ;-) 2008-09-16 01:45 write something that works 2008-09-16 01:45 forget about debugging 2008-09-16 01:45 hehe 2008-09-16 01:45 it works or it doesn' 2008-09-16 01:45 doesn't 2008-09-16 01:46 you'll know by the amount of smoke 2008-09-16 01:46 I don't like my work machine smoking though ;-) 2008-09-16 01:46 you're not going to make me write it for you? 2008-09-16 01:46 oh, wait a minute 2008-09-16 01:46 there should be a simple-ish solution 2008-09-16 01:46 there is 2008-09-16 01:47 sleep 2008-09-16 01:47 endio wakes 2008-09-16 01:47 simple 2008-09-16 01:47 no, no, no - not mucking around with disk io on my live machine 2008-09-16 01:47 ACTION has to pop some popcorn 2008-09-16 01:47 I've already gone through a painful weekend of data recovery 2008-09-16 01:47 why not? 2008-09-16 01:47 trust me 2008-09-16 01:47 nothing will break 2008-09-16 01:47 I trust you... I don't trust myself. 2008-09-16 01:47 except you might have to reboot ;) 2008-09-16 01:47 but probably not 2008-09-16 01:48 hard to go wrong with a read 2008-09-16 01:48 these days, your task can oops and the machine keeps right on running 2008-09-16 01:49 if you fear, compile uml 2008-09-16 01:49 make ARCH=um 2008-09-16 01:50 hmm 2008-09-16 01:50 ok, trying to write something 2008-09-16 01:50 will try pasting here in a moment 2008-09-16 01:51 deal 2008-09-16 01:51 ACTION goes to write some extent code 2008-09-16 01:51 totally drunk 2008-09-16 01:51 prolly better concentrate on the popcorn 2008-09-16 01:52 we had to celebrate tim's twins tonight 2008-09-16 01:55 balloc_from_range has to become balloc_extent_from_range 2008-09-16 01:56 going to be a mess 2008-09-16 01:56 fortunately, balloc_from_range is pretty tight 2008-09-16 01:57 going to be tighter with big endian scan 2008-09-16 02:17 block_t balloc_extent_from_range(struct inode *inode, block_t start, block_t count, unsigned length) 2008-09-16 02:17 declare looks could 2008-09-16 02:17 could somebody implement this please? 2008-09-16 02:18 ;-) 2008-09-16 02:22 maze, you need my uml recipe 2008-09-16 02:22 hmm, wouldn't mind 2008-09-16 02:22 nothing to fear except fear itself 2008-09-16 02:22 ok 2008-09-16 02:22 would you imagine I just roped myself into helping out someone for work... 2008-09-16 02:22 work? 2008-09-16 02:22 what's that? 2008-09-16 02:23 hehe - that annoying aspect of life 2008-09-16 02:24 that occupies 5pm-9am 2008-09-16 02:25 maze, wget http://phunq.net/root_fs 2008-09-16 02:25 100M 2008-09-16 02:26 exactly 2008-09-16 02:34 20% 2008-09-16 02:34 you've got slow uplink ;-) 2008-09-16 02:35 I could have pulled a dvd off of kernel.org by now 2008-09-16 02:36 we'll move 2008-09-16 02:36 to a faster host 2008-09-16 02:36 pretty soon 2008-09-16 02:36 this host is my desktop 2008-09-16 02:36 kernel.org hosts dvds now? 2008-09-16 02:37 not kernel.org 2008-09-16 02:37 home grown 2008-09-16 02:37 homeboy 2008-09-16 02:37 in response to MaZe's comment 2008-09-16 02:37 ;) 2008-09-16 02:37 got it? 2008-09-16 02:38 28% 2008-09-16 02:38 wah 2008-09-16 02:38 sucks 2008-09-16 02:38 what isp is your desktop on? 2008-09-16 02:38 speakeasy 2008-09-16 02:38 they are looking sucky 2008-09-16 02:38 especially at the price I pay 2008-09-16 02:39 granted comcast sent out a letter saying that starting october users going over 250G a month have to pay extra 2008-09-16 02:39 I'm nowhere close 2008-09-16 02:39 internet is barbaric in us of a 2008-09-16 02:40 primitive savages 2008-09-16 02:40 yep 2008-09-16 02:40 I don't come close either, at least I don't think 2008-09-16 02:40 -!- pgquiles__(~pgquiles@50.Red-79-153-248.staticIP.rima-tde.net) has joined #tux3 2008-09-16 02:40 I could if upload was better ... 2008-09-16 02:40 ok, how's it? 2008-09-16 02:40 you have to average roughly 0.8mbit a month to get 250G 2008-09-16 02:40 34% 2008-09-16 02:40 er, 0.8mbit/s constantly 2008-09-16 02:40 cuz I have to write the rest of the recipe by the time you get it 2008-09-16 02:40 good 2008-09-16 02:40 gives me time to think 2008-09-16 02:40 39.5KB/s 2008-09-16 02:40 lol 2008-09-16 02:41 bleah 2008-09-16 02:41 modem 2008-09-16 02:41 fyi: 2008-09-16 02:41 maze@athina:~$ wget http://mirrors.kernel.org/centos/4.7/isos/x86_64/CentOS-4.7-x86_64-binDVD.iso 2008-09-16 02:41 --02:39:10-- http://mirrors.kernel.org/centos/4.7/isos/x86_64/CentOS-4.7-x86_64-binDVD.iso 2008-09-16 02:41 => `CentOS-4.7-x86_64-binDVD.iso' 2008-09-16 02:41 Resolving mirrors.kernel.org... 204.152.191.39, 204.152.191.7 2008-09-16 02:41 Connecting to mirrors.kernel.org|204.152.191.39|:80... connected. 2008-09-16 02:41 HTTP request sent, awaiting response... 200 OK 2008-09-16 02:41 Length: 2,699,399,168 (2.5G) [application/x-iso9660-image] 2008-09-16 02:41 100%[==================================>] 2,699,399,168 13.21M/s ETA 00:001 2008-09-16 02:41 02:41:15 (21.04 MB/s) - `CentOS-4.7-x86_64-binDVD.iso' saved [2699399168/2699399168] 2008-09-16 02:41 maze@athina:~$ 2008-09-16 02:42 38% 2008-09-16 02:42 m 2008-09-16 02:42 ah right 2008-09-16 02:42 ok, I live in santa monica 2008-09-16 02:42 give me a break 2008-09-16 02:42 ;-) 2008-09-16 02:43 I find kernel.org to be ridiculously fast though 2008-09-16 02:43 never a good benchmark 2008-09-16 02:43 not a coincidence 2008-09-16 02:43 wow 2008-09-16 02:43 21MB/s is impressive 2008-09-16 02:43 I don't think I can write to my harddrive that fast 2008-09-16 02:43 kernel.org is not far from the backbone 2008-09-16 02:44 I can 2008-09-16 02:44 yeah, few ms ping time 2008-09-16 02:44 I can do 60MB/s 2008-09-16 02:44 I can max out gigabit 2008-09-16 02:44 I actually know kernel.org does readahead buffering in 64mb chunks 2008-09-16 02:44 max out gigabit = 60 MB/sec 2008-09-16 02:44 before you hit chipset 2008-09-16 02:45 unless it's a servers chipset 2008-09-16 02:45 because I can see 64MB fly in in 0.7 seconds, then 1-2 second wait as kernel.org reads in the next 64mb 2008-09-16 02:45 nah, my laptop does 117 MB/s tcp 2008-09-16 02:45 assuming there's no disk IO involved 2008-09-16 02:45 no it doesn't 2008-09-16 02:45 the disk is much slower of course 2008-09-16 02:45 tested - it does 2008-09-16 02:45 well 2008-09-16 02:46 true 2008-09-16 02:46 that's the max 2008-09-16 02:46 unbellievable 2008-09-16 02:47 ok, the rest of the recipe: 2008-09-16 02:47 [maze@nike ~]$ time dd if=/dev/zero bs=65536 count=20480 | ssh -c arcfour128 maze@athina cat \> /dev/null 2008-09-16 02:47 20480+0 records in 2008-09-16 02:47 20480+0 records out 2008-09-16 02:47 1342177280 bytes (1.3 GB) copied, 17.1915 s, 78.1 MB/s 2008-09-16 02:47 real 0m17.202s 2008-09-16 02:47 user 0m11.025s 2008-09-16 02:47 sys 0m5.620s 2008-09-16 02:47 that's with both systems actually doing work 2008-09-16 02:47 make defconfig ARCH=um && make lines ARCH=um && ./linus ubdo=root_fs 2008-09-16 02:48 and notice with ssh 2008-09-16 02:48 (spot the typo) 2008-09-16 02:48 lines? 2008-09-16 02:48 typo #1 2008-09-16 02:49 ubd or ubdo - not sure 2008-09-16 02:49 ./linus - weird 2008-09-16 02:49 make defconfig ARCH=um && make linux ARCH=um && ./linux ubd0=root_fs 2008-09-16 02:49 but I really have never used this so I'm guessing 2008-09-16 02:49 ah 2008-09-16 02:50 55% 2008-09-16 02:50 I told you I was totally drunk 2008-09-16 02:50 later you will realize you could have created your own root_fs 5x faster than downloading from me 2008-09-16 02:51 only thing is, it might not have nano in it 2008-09-16 02:51 and it might now be 105 MB 2008-09-16 02:51 well, I'm actually writing the code now, so it parallelizes 2008-09-16 02:51 besides I actually have a rootfs somewhere around here 2008-09-16 02:51 it's also 100mb or so 2008-09-16 02:51 mine is better 2008-09-16 02:51 includes ssh/mc a few other things 2008-09-16 02:51 sshd of course 2008-09-16 02:51 reboot time! 2008-09-16 02:51 that's good 2008-09-16 02:51 reboot konrad 2008-09-16 02:52 new kernel yay 2008-09-16 02:52 i can get > 1 reboot second with uml 2008-09-16 02:53 well my move in 2 days gets me a little closer to the backbone 2008-09-16 02:53 I am happy 2008-09-16 02:53 I would be too 2008-09-16 02:53 my connection sucks 2008-09-16 02:54 alright, reboot for real this time 2008-09-16 02:54 I might blog about speakeasy 2008-09-16 02:54 see if that gets it upgraded 2008-09-16 02:55 we are signed up for FAST by the way 2008-09-16 02:55 in advance 2008-09-16 02:55 poseter seeion, WIP report and BOF 2008-09-16 02:56 session 2008-09-16 02:57 konrad's not back 2008-09-16 02:57 seem, reboots take longer the faster cpus get 2008-09-16 02:57 bio_alloc(GFS_KERNEL, 1); 2008-09-16 02:57 wicked... 2008-09-16 02:57 72% 2008-09-16 02:57 pathetic 2008-09-16 02:58 yea! 2008-09-16 02:58 I have a race in my code 2008-09-16 02:58 getting a race on sleep is easy 2008-09-16 02:59 that race? 2008-09-16 03:00 nah 2008-09-16 03:00 possibility to remove module, before bio comes back 2008-09-16 03:00 true 2008-09-16 03:00 don't worry about it 2008-09-16 03:00 just don't have twitchy fingers 2008-09-16 03:01 if you want to fix the race 2008-09-16 03:01 flips: ok 2008-09-16 03:01 see module_inc 2008-09-16 03:01 or whatever it's called 2008-09-16 03:01 just wanted to know if you knew about that news 2008-09-16 03:01 forget 2008-09-16 03:01 always 2008-09-16 03:01 lived and breathed it 2008-09-16 03:03 news? 2008-09-16 03:03 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-16 03:04 new kernel, new kmod-nvidia breakage 2008-09-16 03:04 bh, when zfs stop panicking on boot it might matter ;) 2008-09-16 03:06 sha256, leet 2008-09-16 03:08 -!- Kirantpatil(~kiran@122.167.182.147) has joined #tux3 2008-09-16 03:10 AH! 2008-09-16 03:10 what do you know, someone already wrote all that code, and I don't need to reimplement it... 2008-09-16 03:10 lol 2008-09-16 03:10 what, sha256? 2008-09-16 03:10 ACTION boggles at the concept 2008-09-16 03:11 nah, get_sb_bdev and kill_block_super 2008-09-16 03:11 well, still worth having written half of it by myslef 2008-09-16 03:11 ah 2008-09-16 03:17 maze, true 2008-09-16 03:17 just have to put up with some minor oddities 2008-09-16 03:17 you will still have to reimplement it 2008-09-16 03:17 but not yet 2008-09-16 03:17 probably 2008-09-16 03:18 still try to figure out what it actually accomplishes 2008-09-16 03:18 does it work? 2008-09-16 03:18 ask away 2008-09-16 03:18 not done yet, so no 2008-09-16 03:18 doesn't compile 2008-09-16 03:19 the kernel is in bad need of documentation 2008-09-16 03:20 flips: I got into a disagreement with some zfs fanboys about their file system 2008-09-16 03:20 I don't like this 2008-09-16 03:20 it's doing something and I don't know what 2008-09-16 03:20 ugh 2008-09-16 03:21 told them that they can't do much other than reformat their volume(s) if there's a corruption since they don't have an integrity checker 2008-09-16 03:21 they didn't get it, oh well 2008-09-16 03:22 lol 2008-09-16 03:22 bh, understandable 2008-09-16 03:23 the more I know about it, the more I'm glad it's not on linux 2008-09-16 03:23 they're kind of stupid and they weren't listening, no sense in arguing with folks like that 2008-09-16 03:23 they think that checksums will save the entire fucking world 2008-09-16 03:24 it'll help, but it's not the full story 2008-09-16 03:24 they need to add some metamagical themas 2008-09-16 03:24 that, and only that will save the world 2008-09-16 03:24 what's that ? 2008-09-16 03:24 the worlds already been saved - over two thousand years ago 2008-09-16 03:24 special form of checksum 2008-09-16 03:24 ... 2008-09-16 03:24 requires qubit processor 2008-09-16 03:24 uh 2008-09-16 03:25 bear in mind there was some drinking involved tonight 2008-09-16 03:25 celebrating tim's tiwn 2008-09-16 03:25 just get your file system working dude so that folks will stop trash talking behind your back and stuff 2008-09-16 03:25 tim's twins 2008-09-16 03:25 heh 2008-09-16 03:25 let them trash talk 2008-09-16 03:26 only makes them trash talkers 2008-09-16 03:26 and trash talkers always to it behind one's back 2008-09-16 03:26 nothing changes 2008-09-16 03:26 true 2008-09-16 03:26 besides, I know who they are ;) 2008-09-16 03:27 funny how those reports get around 2008-09-16 03:27 anybodny who would be trash talking now is simply somebody who can't read code 2008-09-16 03:30 maze, you must have it by now 2008-09-16 03:30 boot uml and change your life 2008-09-16 03:30 nah, I'm slow 2008-09-16 03:30 oh I have the rootfs 2008-09-16 03:30 the commands are elementary 2008-09-16 03:30 I don't have runnable (or even compileable) code 2008-09-16 03:30 doesn't matter 2008-09-16 03:30 just boot a defconfig 2008-09-16 03:30 addiction is instant 2008-09-16 03:30 faster than crack 2008-09-16 03:31 I ran out of disk space during kernel compile. 2008-09-16 03:31 :P 2008-09-16 03:31 delete gnome 2008-09-16 03:32 ow. 2008-09-16 03:32 get rid of that centos image MaZe 2008-09-16 03:32 it's on another machine 2008-09-16 03:32 oh :( 2008-09-16 03:32 get out your credit card and order a new hd 2008-09-16 03:32 660GB/$85 newegg 2008-09-16 03:32 rush 2008-09-16 03:32 overnight 2008-09-16 03:33 pay 50% of the cost of the disk ;) 2008-09-16 03:33 lol 2008-09-16 03:33 it's a laptop drive 2008-09-16 03:33 oh 2008-09-16 03:33 pay $150 2008-09-16 03:33 then 2008-09-16 03:33 better idea 2008-09-16 03:33 throw in dvd 2008-09-16 03:33 and burn 2008-09-16 03:34 delete the windows partition 2008-09-16 03:34 you know you don't use it 2008-09-16 03:34 it's only 8gb 2008-09-16 03:34 but I have a compressed fast install dump, that I'll burn 2008-09-16 03:35 nothing installs windows xp quite as fast as bzcat winxp.img.bz2 > /dev/win 2008-09-16 03:36 err, re-installs 2008-09-16 03:36 grab the image after activating? :D 2008-09-16 03:36 of course 2008-09-16 03:36 and updating 2008-09-16 03:36 nice 2008-09-16 03:36 fully patched sp3 2008-09-16 03:38 8g is about 50 compiled kernels 2008-09-16 03:38 delete and be happy 2008-09-16 03:39 well, maybe only 25 2008-09-16 03:39 objs got bloaty 2008-09-16 03:40 actually I had 2G free 2008-09-16 03:40 the build took it all up 2008-09-16 03:40 gross 2008-09-16 03:41 centos feature? 2008-09-16 03:41 ACTION couldn't reisist 2008-09-16 03:41 rather kernel build bloat 2008-09-16 03:41 sure 2008-09-16 03:41 didn't realize it was that bad 2008-09-16 03:41 although maybe because I was doing a test build of a patch 2008-09-16 03:41 there was a decent kernel build once... 2008-09-16 03:42 build with -g bloats it 2008-09-16 03:42 make allmodconfig; make bzImage 2008-09-16 03:42 oh now 2008-09-16 03:42 just don't 2008-09-16 03:42 was testing a patch 2008-09-16 03:42 make defconfig 2008-09-16 03:43 even defconfig is bloated 2008-09-16 03:43 but not even in the ballpark of what you did 2008-09-16 03:46 hehe 2008-09-16 03:46 I wonder where your kernel pulls modules from - or indeed even if it is modular 2008-09-16 03:48 do you realize your root_fs, compresses down to 24MB with bzip? 2008-09-16 03:48 you should switch to using an initramfs.cpio.gz 2008-09-16 03:48 oh yeah 2008-09-16 03:48 :-/ 2008-09-16 03:48 forgot 2008-09-16 03:48 about my lame uplink 2008-09-16 03:48 WARNING: vmlinux: 'memcpy' exported twice. Previous export was in vmlinux 2008-09-16 03:49 huh? 2008-09-16 03:49 what kernel? 2008-09-16 03:49 2.6.27-rc6 2008-09-16 03:49 make mrproper 2008-09-16 03:49 yeah! 2008-09-16 03:49 panic 2008-09-16 03:49 was 2008-09-16 03:49 ok 2008-09-16 03:49 let's go back to 2.6.26.5 2008-09-16 03:49 good idea 2008-09-16 03:50 dvd burned 2008-09-16 03:50 did windoze 2008-09-16 03:50 die 2008-09-16 03:50 ? 2008-09-16 03:50 hmm? why? 2008-09-16 03:51 oh, not yet 2008-09-16 03:51 just wondering 2008-09-16 03:51 verify 2008-09-16 03:52 I'll rzip my root_fs 2008-09-16 03:52 see how small it gets 2008-09-16 03:52 ok building 2.6.26.5 2008-09-16 03:53 it's trying to rzip 2008-09-16 03:53 kind of dimming the lights here 2008-09-16 03:53 memory wise 2008-09-16 03:53 hmm, wonder if one of the two patches I was testing was what broke rc6 2008-09-16 03:53 probably 2008-09-16 03:53 better exit firefox 2008-09-16 03:54 what the hell is rzip? 2008-09-16 03:54 tridge's zip 2008-09-16 03:54 besides sounding powerfull 2008-09-16 03:54 beats pretty much anything 2008-09-16 03:54 like a chainsaw 2008-09-16 03:54 also author of rsync 2008-09-16 03:55 ls -l root_fs* 2008-09-16 03:55 -rwxr-xr-x 1 root root 18067032 Sep 16 02:24 root_fs.rz 2008-09-16 03:55 ok? 2008-09-16 03:57 yum install rzip 2008-09-16 03:57 good taste 2008-09-16 03:58 Kernel panic - not syncing: Out of memory and no killable processes... 2008-09-16 03:58 wtf 2008-09-16 03:58 host or guest? 2008-09-16 03:58 console [mc-1] enabled 2008-09-16 03:58 ubda: unknown partition table 2008-09-16 03:58 VFS: Mounted root (ext2 filesystem) readonly. 2008-09-16 03:58 request_module: runaway loop modprobe binfmt-464c 2008-09-16 03:58 request_module: runaway loop modprobe binfmt-464c 2008-09-16 03:58 request_module: runaway loop modprobe binfmt-464c 2008-09-16 03:58 request_module: runaway loop modprobe binfmt-464c 2008-09-16 03:58 request_module: runaway loop modprobe binfmt-464c 2008-09-16 03:58 Kernel panic - not syncing: Out of memory and no killable processes... 2008-09-16 03:58 guest, after all - I'm still here 2008-09-16 03:59 you did the commands above? 2008-09-16 03:59 ah 2008-09-16 03:59 you sure that's ubd0 2008-09-16 03:59 yes 2008-09-16 03:59 cause $ file ../root_fs 2008-09-16 03:59 ../root_fs: Linux rev 0.0 ext2 filesystem data 2008-09-16 03:59 try fsck on the root_fs 2008-09-16 03:59 ah, no nevermind, that runs 2008-09-16 04:00 does um have to run as root? 2008-09-16 04:00 no 2008-09-16 04:00 wonder if it's a 64 bit bug 2008-09-16 04:00 email jeff 2008-09-16 04:00 jdike 2008-09-16 04:01 I would expect it to work though 2008-09-16 04:01 I think somebody at intel must have a 64 bit workstation 2008-09-16 04:01 $ gcc --version 2008-09-16 04:01 gcc (GCC) 4.3.0 20080428 (Red Hat 4.3.0-8) 2008-09-16 04:01 ight 2008-09-16 04:01 night 2008-09-16 04:01 red hat... that's always scary 2008-09-16 04:01 bye bye 2008-09-16 04:01 lol 2008-09-16 04:01 bye 2008-09-16 04:02 oh 2008-09-16 04:02 I know 2008-09-16 04:02 your image is 32-bit 2008-09-16 04:02 my kernel is 64-bit 2008-09-16 04:02 right 2008-09-16 04:02 it can't handle the 32-bit binaries 2008-09-16 04:02 it's supposed to work 2008-09-16 04:02 ah, but is the support code compiled in? 2008-09-16 04:02 or is it trying to modprobe 2008-09-16 04:02 seeing a 32-bit modprobe 2008-09-16 04:03 it should be? 2008-09-16 04:03 and trying to modprobe 2008-09-16 04:03 should not be modprobing 2008-09-16 04:03 anything 2008-09-16 04:03 not, it's not 2008-09-16 04:03 request_module: runaway loop modprobe binfmt-464c 2008-09-16 04:03 but you're probalby right 2008-09-16 04:03 ;-) 2008-09-16 04:03 it's supposed to work and doesn't 2008-09-16 04:03 kernel bugz 2008-09-16 04:03 ahah 2008-09-16 04:03 turn off module loading 2008-09-16 04:03 in the kernel config 2008-09-16 04:04 rusty's code 2008-09-16 04:04 I take no responsibility 2008-09-16 04:04 except module loading is precisely what I want to debug my module... 2008-09-16 04:05 CONFIG_IA32_EMULATION defaults to no 2008-09-16 04:08 forget modules in uml 2008-09-16 04:08 not the point 2008-09-16 04:08 your whole kernel is a module 2008-09-16 04:08 yeah, will this doesn't work at all 2008-09-16 04:08 modules ain't the problems 2008-09-16 04:08 modules and uml don't work that great 2008-09-16 04:08 for debugging 2008-09-16 04:09 the problem is 64-bit kernel, 32-bit userspace, no 32bit emulation 2008-09-16 04:09 well 2008-09-16 04:09 grab a rootfs 2008-09-16 04:09 yougot another bootable partition? 2008-09-16 04:09 or 2008-09-16 04:09 _compile a 32 bit kernel_ 2008-09-16 04:09 remember this is uml 2008-09-16 04:09 yeah, but how to compile a 32-bit uml 2008-09-16 04:09 ARCH=um32? 2008-09-16 04:09 hmm 2008-09-16 04:09 yah 2008-09-16 04:09 busted isn't it 2008-09-16 04:10 bummer 2008-09-16 04:10 well 2008-09-16 04:10 hmm 2008-09-16 04:10 sure, just set the config option 2008-09-16 04:10 well I do happen to have a handy setarch 2008-09-16 04:10 nope 2008-09-16 04:10 that doesn't work 2008-09-16 04:11 I guess you're right 2008-09-16 04:11 it's lamer than that 2008-09-16 04:11 well 2008-09-16 04:11 you need a 64 bit rootfs 2008-09-16 04:11 you got any other bootable partitions? 2008-09-16 04:12 mac os x ;-) 2008-09-16 04:12 that's why running under kvm would be easier 2008-09-16 04:12 I've now attempted: 2008-09-16 04:12 make defconfig ARCH=um 2008-09-16 04:13 make menuconfig ARCH=um 2008-09-16 04:13 turned on IA32_EMULATION 2008-09-16 04:13 now 2008-09-16 04:13 make oldconfig ARCH=um 2008-09-16 04:13 lots of enters 2008-09-16 04:13 make linux ARCH=um 2008-09-16 04:13 I really should get home and to bed 2008-09-16 04:14 wow, still at the plex 2008-09-16 04:14 you should 2008-09-16 04:14 yeah 2008-09-16 04:14 was playing cards till 11 2008-09-16 04:14 I apologize on behalf of all lame linux hackers 2008-09-16 04:14 about the 64/32 bit thing 2008-09-16 04:15 WARNING: vmlinux: 'memcpy' exported twice. Previous export was in vmlinux 2008-09-16 04:15 but it built... 2008-09-16 04:15 thank god for small meries 2008-09-16 04:15 mercies 2008-09-16 04:16 VFS: Cannot open root device "98:0" or unknown-block(98,0) 2008-09-16 04:16 Please append a correct "root=" boot option; here are the available partitions: 2008-09-16 04:16 Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0) 2008-09-16 04:16 yeah 2008-09-16 04:16 awesome 2008-09-16 04:16 progress 2008-09-16 04:16 not really 2008-09-16 04:16 this time panic was before mount 2008-09-16 04:16 you just need to get the path to your root_fs right now 2008-09-16 04:16 no, it didn't find your rootfs 2008-09-16 04:16 lame error 2008-09-16 04:16 path was right 2008-09-16 04:17 tab completed 2008-09-16 04:17 see your command? 2008-09-16 04:17 $ linux-2.6.26.5/linux ubd0=root_fs 2008-09-16 04:17 [maze@nike l]$ ls -al root_fs 2008-09-16 04:17 -rw-r--r-- 1 maze eng 104857600 2008-09-16 02:24 root_fs 2008-09-16 04:17 ubd0 most not be compiled into the kernel 2008-09-16 04:18 must somehow have gotten unselected 2008-09-16 04:18 true 2008-09-16 04:18 if you didn't use make defconfig, probable 2008-09-16 04:18 you can strace uml 2008-09-16 04:19 I think... 2008-09-16 04:21 I think problem is 2008-09-16 04:21 make menuconfig was without ARCH=um 2008-09-16 04:21 looks like 64-bit um doesn't have an emulate 32 option 2008-09-16 04:21 that will do you 2008-09-16 04:22 64 bit kernel is supposed to run 32 bit binaries 2008-09-16 04:22 thunking is build into the kernel 2008-09-16 04:25 I don't think it knows how to parse a 32-bit elf header though 2008-09-16 04:26 you might think about that go home and crash option 2008-09-16 04:26 tomorrow get a 64 bit rootfs from somewhere, including copying it from yourself 2008-09-16 04:26 just copy your root onto another disk 2008-09-16 04:26 and plug it in 2008-09-16 04:26 beside your good old root 2008-09-16 04:30 hmm 2008-09-16 04:35 last attempt 2008-09-16 04:35 I should probably make -j2 2008-09-16 04:35 oh well 2008-09-16 04:36 don't make many modules 2008-09-16 04:36 j4, for dual intel 2008-09-16 04:36 really? 2008-09-16 04:36 just make 2008-09-16 04:36 seems to keep 1 cpu pegged 2008-09-16 04:36 two x smt 2008-09-16 04:36 lame smt 2008-09-16 04:36 oh, this is one core duo 2008-09-16 04:36 not double dual 2008-09-16 04:37 right, they kinda dropped smt 2008-09-16 04:37 ddin't they 2008-09-16 04:37 mixed bag 2008-09-16 04:37 now if we actually had a decent migrating os, It could migrate to my desktop 2008-09-16 04:37 seemedlike a good idea 2008-09-16 04:37 not really 2008-09-16 04:37 smt is still coming back 2008-09-16 04:37 dec made it work, intel not so much 2008-09-16 04:37 ht = smt lite 2008-09-16 04:37 migration is another pet peeve of mine - it shouldn't be that frickin hard 2008-09-16 04:38 of course - a decent fs is the first step ;-) 2008-09-16 04:38 it's hard because we like it hard 2008-09-16 04:38 I like reaching not only for the moon, but for the sun and alpha centauri at the same time ;-) 2008-09-16 04:38 that's we we keep getting in our own way 2008-09-16 04:39 they already had it mostly working in 2.4 2008-09-16 04:39 and then they had a beta for 2.6 2008-09-16 04:39 but I think part of the problem is the overblown syscall interface 2008-09-16 04:39 and its small compared to windows 2008-09-16 04:39 linux should be stripped down, and the linux syscall interface should be a lodable module 2008-09-16 04:40 given multiple enemas 2008-09-16 04:40 from both ends 2008-09-16 04:40 agreed 2008-09-16 04:40 it won't happen in our lifetimes 2008-09-16 04:40 that way you could replace it with whatever 2008-09-16 04:40 heh 2008-09-16 04:40 email linus 2008-09-16 04:40 and experiment with a set of syscalls which would be inherently migratable 2008-09-16 04:40 tell him it's time to make the syscall table loadable 2008-09-16 04:40 and have it per - process selectable 2008-09-16 04:41 is he going to make a laughing stock of me? 2008-09-16 04:41 he's going to say something memorable anyway 2008-09-16 04:41 might be nice to you if he's having a good day 2008-09-16 04:41 and I don't email him first ;) 2008-09-16 04:43 well, balloc_extent_from_range looks ready to try 2008-09-16 04:43 not pretty 2008-09-16 04:43 far from it 2008-09-16 04:43 qemu-kvm -M pc -cpu qemu64 -m 256 -smp 2 -net none -kernel linux-2.6.26.5/arch/x86_64/boot/bzImage -drive file=root_fs,boot=on -append 'ro root=/dev/hda' 2008-09-16 04:43 works 2008-09-16 04:43 :) 2008-09-16 04:43 amazing 2008-09-16 04:43 make mrproper && make clean && make defconfig && make bzImage 2008-09-16 04:43 what about the ARCH=um? 2008-09-16 04:43 that's not uml I guess 2008-09-16 04:44 normal 64-bit kernel 2008-09-16 04:44 yah, and a real boot 2008-09-16 04:44 me and uml send our regrets 2008-09-16 04:44 but... 2008-09-16 04:44 didn't see you boot 2008-09-16 04:44 huh? 2008-09-16 04:44 ah... 2008-09-16 04:45 oh 2008-09-16 04:45 trying to root me remotely? 2008-09-16 04:45 qemu 2008-09-16 04:45 point taken 2008-09-16 04:45 ;-) 2008-09-16 04:45 I'll leave that to shapor 2008-09-16 04:45 I thought you had some sort of ping in the image 2008-09-16 04:45 ok, you are qemu and I am uml 2008-09-16 04:45 yeah, looks like I'll stick to qemu 2008-09-16 04:45 seems to work fine 2008-09-16 04:45 somebody needs to findout why 32 bit root_fs doesn't work on 64 bit kernel 2008-09-16 04:46 I'm sure jdike will be fascinated 2008-09-16 04:46 there's some mongo dir and stuff in their 2008-09-16 04:46 I'm guessing nobody's even tried 2008-09-16 04:46 balloc_extent_from_range is ready to try 2008-09-16 04:46 not tonight 2008-09-16 04:46 or rather 2008-09-16 04:46 not now 2008-09-16 04:46 I've got a meeting at 9:30 2008-09-16 04:47 argh 2008-09-16 04:47 what rootfs did you use with qemu? 2008-09-16 04:47 yours 2008-09-16 04:47 don't go hom 2008-09-16 04:47 64-bit kernel, 32-bit rootfs - works fine 2008-09-16 04:47 sleep on the massage chair 2008-09-16 04:47 good, like it's supposed to 2008-09-16 04:47 sounds like a jdike issue 2008-09-16 04:47 yeah, I think I'll do that 2008-09-16 04:48 set it on repeat 2008-09-16 04:49 so there probably wasn't a problem with 2.6.27-rc6 after all 2008-09-16 04:49 nice to have a stable point to assign blame from 2008-09-16 04:50 7875 flips 2008-09-16 04:50 1715 MaZe 2008-09-16 04:50 1289 shapor 2008-09-16 04:50 882 bh 2008-09-16 04:50 668 konrad 2008-09-16 04:50 380 tim_dimm 2008-09-16 04:50 128 RazvanM 2008-09-16 04:50 113 vandenoever 2008-09-16 04:50 96 flipz 2008-09-16 04:50 lol 2008-09-16 04:50 irc? 2008-09-16 04:50 yeah 2008-09-16 04:50 as if I don't get enough typing 2008-09-16 04:50 maze is rising 2008-09-16 04:50 am I? 2008-09-16 04:51 think so 2008-09-16 04:51 I thought you were going up much much faster 2008-09-16 04:51 race for 2nd 2008-09-16 04:51 what do you mean race for 2nd? 2008-09-16 04:51 flipz is a real lame 2008-09-16 04:51 don't you tink? 2008-09-16 04:51 heh 2008-09-16 04:52 better get summa that sleep 2008-09-16 04:52 yeah, that's probably because he does the coding, while you sit around on irc 2008-09-16 04:52 I could be accused of contributing to the delinquency of a googler 2008-09-16 04:52 you could 2008-09-16 04:53 return found - contig + 1; <- tomorrow we see if it works 2008-09-16 04:53 new extent balloc 2008-09-16 04:53 anyway - I'm gone 2008-09-16 04:53 bye 2008-09-16 04:53 me 2 2008-09-16 07:11 -!- kushal(~kushal@121.246.32.210) has joined #tux3 2008-09-16 08:01 -!- Kirantpatil(~kiran@122.167.215.81) has joined #tux3 2008-09-16 08:01 hello list 2008-09-16 08:02 how many hours to go for the Part-3 of tux3 university ? 2008-09-16 09:53 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-16 10:25 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-16 10:45 -!- kushal(~kushal@121.246.33.21) has joined #tux3 2008-09-16 10:51 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-16 10:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-16 10:58 howdy 2008-09-16 10:58 when's the next tux3 university scheduled? 2008-09-16 11:37 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-16 11:50 -!- Kirantpatil(~kiran@122.167.176.249) has joined #tux3 2008-09-16 11:52 -!- Kirantpatil(~kiran@122.167.176.249) has left #tux3 2008-09-16 12:00 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-16 12:00 ACTION is back 2008-09-16 12:00 hrm thats weird i got banned from oftc.net 2008-09-16 12:00 maybe my bot went crazy 2008-09-16 12:03 ah no i guess it was everyone 2008-09-16 12:03 heh 2008-09-16 12:52 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-16 13:17 folks 2008-09-16 13:58 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-16 14:06 morning 2008-09-16 14:07 it's an extent morning for #tux3 2008-09-16 14:38 hey 2008-09-16 14:38 hi bh 2008-09-16 14:39 how's it going ? 2008-09-16 14:41 coding bitops 2008-09-16 14:41 fun 2008-09-16 14:41 extents are fun 2008-09-16 14:41 been researching tree locking? 2008-09-16 14:41 there's a lot written on the subject 2008-09-16 14:55 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-16 14:57 ok, looks like we have extent enabled balloc now 2008-09-16 14:57 but 2008-09-16 14:57 lots still to do 2008-09-16 14:57 on extents 2008-09-16 14:58 messy messy 2008-09-16 16:02 tux3 noses over 7K lines 2008-09-16 16:02 decimal K 2008-09-16 16:05 Sun is using mercurial for its new project site 2008-09-16 16:06 I think that means mercurial wins 2008-09-16 16:06 big props to matt 2008-09-16 16:06 http://projectkenai.com/projects/xvmserver/sources/earlyaccess/show 2008-09-16 16:06 clueful of Sun 2008-09-16 16:06 I'm shocked ;) 2008-09-16 16:18 projectkenai is new and doesn't have git yet, but they plan to 2008-09-16 16:18 if it's the same site I'm remembering 2008-09-16 16:19 yep, it is 2008-09-16 16:58 flips: nice 2008-09-16 17:02 -!- kbingham(~kbingham@92.9.62.202) has joined #tux3 2008-09-16 17:17 I'm thinking back on a design decision I made pretty early for the prototype - to depart from the usual kernel get_block model and have tux3 actually initiate the IO at that point, unlike get_block where the fs just tells the VFS where a particular logical block is supposed to be read/written physically 2008-09-16 17:17 I am increasingly getting the feeling that that decision was right 2008-09-16 17:17 especially as I get working on extents 2008-09-16 17:18 and took a look at how the btrfs guys do extents 2008-09-16 17:18 that is scary 2008-09-16 17:18 looks like they want to go make big changes to the vfs 2008-09-16 17:18 without really considering the alternatives 2008-09-16 17:18 I might not have looked close enough, but that's what it looks like on first blush 2008-09-16 17:28 when are atomic commits going to work ? 2008-09-16 17:28 after extents 2008-09-16 17:29 or sooner if you want to code it 2008-09-16 17:29 fun 2008-09-16 17:31 flips: what was that disk failure article you were mentioning last night? 2008-09-16 17:32 got a link? 2008-09-16 17:32 just a sec 2008-09-16 17:32 http://alumnit.ca/~apenwarr/log/?m=200809#08 2008-09-16 17:40 interesting 2008-09-16 17:41 nearly sk8 oclock 2008-09-16 17:42 ACTION is getting tired of checking in extent bits 2008-09-16 17:43 wow i'd never heard of ionice 2008-09-16 17:43 awesome! 2008-09-16 17:43 "Linux supports io scheduling priorities and classes since 2.6.13 with the CFQ io scheduler." 2008-09-16 17:43 !! 2008-09-16 17:44 well 2008-09-16 17:44 have i been living under a rock? 2008-09-16 17:44 don't get _too_ excited 2008-09-16 17:44 cfq is, um 2008-09-16 17:44 you know 2008-09-16 17:44 there's a reason it's not the default 2008-09-16 17:44 just the fact there is such an interface is reassuring 2008-09-16 17:44 yes 2008-09-16 17:45 pluggable disk elevators 2008-09-16 17:45 if it doesn't work as advertised thats simply a bug to file 2008-09-16 17:45 been in for 4-5 years 2008-09-16 17:45 then someone smarter than me can fix it ;) 2008-09-16 17:45 danger is when somebody less smart than you fixes it 2008-09-16 17:47 ok, we have extent allocate and extent free now 2008-09-16 17:47 not really great versions of, but simple and serviceable for now 2008-09-16 17:47 that was the easy part 2008-09-16 18:06 -!- cdk(~chinmay@121.246.33.227) has joined #tux3 2008-09-16 18:29 -!- RalucaM(~ral@londo.cnds.jhu.edu) has joined #tux3 2008-09-16 18:33 -!- Aks(~ankitsriv@123.239.79.30) has joined #tux3 2008-09-16 18:34 -!- Aks(~ankitsriv@123.239.79.30) has left #tux3 2008-09-16 18:53 -!- kbingham(~kbingham@92.8.19.189) has joined #tux3 2008-09-16 19:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-16 19:09 -!- kbingham(~kbingham@92.8.3.46) has joined #tux3 2008-09-16 19:13 -!- stargazr5(~gauravstt@59.95.18.36) has joined #tux3 2008-09-16 19:20 -!- Kirantpatil(~kiran@122.167.218.72) has joined #tux3 2008-09-16 19:20 -!- Kirantpatil(~kiran@122.167.218.72) has left #tux3 2008-09-16 19:43 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-16 19:45 15 minutes and counting 2008-09-16 19:54 OT: http://www.noodleson.com/store/images/nongshim/vegetal.jpg 2008-09-16 19:57 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-16 19:57 seems on topic to me 2008-09-16 19:57 :-) 2008-09-16 19:57 hi ralucam 2008-09-16 19:57 OT but important 2008-09-16 19:57 my new chair is comfy 2008-09-16 19:57 hey tim 2008-09-16 19:58 yes 2008-09-16 19:58 online 2008-09-16 19:58 that's important 2008-09-16 19:58 and this watermellon is delicious 2008-09-16 19:58 hi everybody 2008-09-16 19:58 everybody warming up their browers? 2008-09-16 19:59 ACTION is trying to do at least a modest part of the homework... 2008-09-16 19:59 standard precaution is to restart firefox 2008-09-16 19:59 so it doesn't go oom when I'm trying to talk ;) 2008-09-16 19:59 ugh, oh right, what was the homework? 2008-09-16 20:00 read the superblock? ;-) 2008-09-16 20:00 flips: homework is: know how the root dir is loaded and initialized, and now that differs from how any other inode is opened 2008-09-16 20:00 it was about loading the root directory 2008-09-16 20:00 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-16 20:00 and what did we find? 2008-09-16 20:01 that it gets loaded explicitely 2008-09-16 20:01 because... 2008-09-16 20:01 because dir lookup doesn't work 2008-09-16 20:01 well it's the mount point 2008-09-16 20:02 because there is no dir to look up in 2008-09-16 20:02 root of the tree and all that 2008-09-16 20:02 ACTION is searching for s_root... 2008-09-16 20:02 so we have to open the root dir "manually", using functionality that normally gets called by something like ext2_lookup 2008-09-16 20:02 not quite that function 2008-09-16 20:02 anyway 2008-09-16 20:03 we're starting somewhere different today 2008-09-16 20:03 because maze wants to go faster ;) 2008-09-16 20:03 so let's go to sys_write 2008-09-16 20:03 I'm guiltless - I tell you... 2008-09-16 20:03 http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c#L1062 2008-09-16 20:03 ok ok ok ok 2008-09-16 20:03 I think we killed lxr 2008-09-16 20:04 seems 2008-09-16 20:04 next time I'll go there before I announce the destination ;) 2008-09-16 20:04 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L370 2008-09-16 20:04 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L370 2008-09-16 20:04 it works from here 2008-09-16 20:04 works here too 2008-09-16 20:04 Razvan's always faster ;-) 2008-09-16 20:04 ok, who wants to walk down into it? 2008-09-16 20:04 instead of me this time? 2008-09-16 20:05 seems to me, razvanm does that pretty well 2008-09-16 20:05 you know the first few layers 2008-09-16 20:05 it's just the same idea as sys_open 2008-09-16 20:05 ACTION is doesn't too much about fs yet :( 2008-09-16 20:05 you know how to poke down into a syscall though 2008-09-16 20:05 file_pos_read and file_pos_write are probably to fetch and store the current file offset 2008-09-16 20:05 just keep clicking until you see something that isn't obvious 2008-09-16 20:06 let's look at those 2008-09-16 20:06 fget_light and fput_light must be fd to struct file lookup with locking 2008-09-16 20:06 so all that's left is vfs_write 2008-09-16 20:06 pretty simple (file_pos_read/write) 2008-09-16 20:06 which was kind of obvious to begin with ;-) 2008-09-16 20:06 I don't know why they're even abstracted 2008-09-16 20:07 fget/put_light are demented 2008-09-16 20:07 two of the most subtle and demented functions in the entire kernel 2008-09-16 20:07 don't worry about them today ;) 2008-09-16 20:07 they were conceived by a vile an twisted mind, and get to live because they are fast 2008-09-16 20:07 what's demented about them? 2008-09-16 20:07 heh 2008-09-16 20:07 later 2008-09-16 20:07 really 2008-09-16 20:08 google if you must 2008-09-16 20:08 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L313 2008-09-16 20:08 ok, that's vfs_write 2008-09-16 20:08 suffice to say that they keep our file from disappearing while we are writing to it 2008-09-16 20:08 it would be bad otherwise 2008-09-16 20:08 right - locking 2008-09-16 20:08 razvanm, good, and what do you see there? 2008-09-16 20:09 a bunch of permission checks 2008-09-16 20:09 and then a f_op->write call 2008-09-16 20:09 f_op->write if exists 2008-09-16 20:09 typical, right? 2008-09-16 20:09 provided it's available 2008-09-16 20:09 what you don't see is any locks being taken 2008-09-16 20:09 ot do_sync_write otherwise 2008-09-16 20:09 ot = or 2008-09-16 20:09 there is _very little locking_ in this path 2008-09-16 20:09 helping make it fast 2008-09-16 20:09 and a cute inc_syscw 2008-09-16 20:09 -!- kbingham(~kbingham@92.8.217.48) has joined #tux3 2008-09-16 20:10 the consequence of that is, the filesystem can be hit in a very parallel way 2008-09-16 20:10 what is rw_verify_area? 2008-09-16 20:10 probably locking 2008-09-16 20:10 sometimes in ways that don't make sense, or are from buggy, racy applications, and the filesystem has to do something reasonable 2008-09-16 20:10 i.e., not crash and not corrupt 2008-09-16 20:10 rw_verify_area... hmm 2008-09-16 20:10 as in byte-range locks 2008-09-16 20:10 newish thing 2008-09-16 20:11 no sorry 2008-09-16 20:11 it's implementing flock 2008-09-16 20:11 bad name 2008-09-16 20:11 very 2008-09-16 20:11 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L196 2008-09-16 20:11 we don't care about it really 2008-09-16 20:11 I'd guess it checks no-one else has locked the area we're about to write to 2008-09-16 20:11 normally nobody uses flock 2008-09-16 20:12 crufty old baggage 2008-09-16 20:12 more interesting that selinux has a hook there 2008-09-16 20:12 the "security_*" <- typical selinux hook 2008-09-16 20:12 flips: inc_syscw.. tsk->syscw++ 2008-09-16 20:12 but this is not really interesting, let's pop back out and go deeper 2008-09-16 20:12 that's a generic security hook though right? 2008-09-16 20:12 yes 2008-09-16 20:13 I forget what we call the generic harness 2008-09-16 20:13 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L313 <- back here 2008-09-16 20:13 next we see that meme again 2008-09-16 20:14 our fs can either completely replace the write logic with its own, or the vfs will supply a basic framework and call lower level methods in the fs 2008-09-16 20:14 327 if (file->f_op->write) 2008-09-16 20:14 328 ret = file->f_op->write(file, buf, count, pos); 2008-09-16 20:14 very few fs's will use this hook 2008-09-16 20:15 i thought we were supposed to use the vfs framework... 2008-09-16 20:15 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L288 2008-09-16 20:15 almost all continue on down into do_sync_write 2008-09-16 20:15 which is still the vfs 2008-09-16 20:15 most filesystems don't want to have the responsibility of doing all the things the vfs is about to do now 2008-09-16 20:16 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-16 20:16 http://lxr.linux.no/linux+v2.6.26.5/fs/read_write.c#L288 <- do_sync_write 2008-09-16 20:16 so, internally the kernel is kind of aio oriented 2008-09-16 20:16 asynchronous IO 2008-09-16 20:17 and synchronous IO is just a shell around it of the form "start and IO op; wait on a wait queue until its done" 2008-09-16 20:17 we see that here 2008-09-16 20:17 very simple... if you don't poke into the details 2008-09-16 20:17 we will, but later 2008-09-16 20:17 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50 2008-09-16 20:17 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2487 2008-09-16 20:17 so now... we lose the trail 2008-09-16 20:18 because the vfs calls the real write action through a variable 2008-09-16 20:18 any suggestions how we can pick up that trail again? 2008-09-16 20:18 aio_write :P 2008-09-16 20:18 filp->f_op->aio_write 2008-09-16 20:18 right 2008-09-16 20:18 we can grep the entire kernel for it 2008-09-16 20:19 or we can go back to ext2/inode.c 2008-09-16 20:19 where I know it is ;) 2008-09-16 20:19 let's do that 2008-09-16 20:19 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2364 2008-09-16 20:19 you're getting ahead ;) 2008-09-16 20:19 let's see how we get there 2008-09-16 20:20 and I was wrong about the file 2008-09-16 20:20 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50 2008-09-16 20:20 interesting 2008-09-16 20:21 ? 2008-09-16 20:21 now we see that ext2 just fills that in with a generic function 2008-09-16 20:21 that maze already found 2008-09-16 20:21 so lets clikc on it and go to filemap 2008-09-16 20:21 even this a fs is not interesting in implementing it :D 2008-09-16 20:21 that's right 2008-09-16 20:21 ext2 mostly lets the vfs do everything for it 2008-09-16 20:21 and its still 7,500 lines long 2008-09-16 20:21 worth considering what's in those 7,500 lines 2008-09-16 20:22 keep in mind that the VFS was essentially created just by taking a functioning filesystem and chopping it in half 2008-09-16 20:22 the top half, which became the vfs 2008-09-16 20:23 and the bottom half, which is a bunch of specific methods for doing things like figuring out the position of a block on disk 2008-09-16 20:23 and the bottom half which became the fs drivers 2008-09-16 20:23 ext2 should still have something to say about the write... 2008-09-16 20:23 which because ext2 and all its friends 2008-09-16 20:23 might not 2008-09-16 20:23 ext2 is not journaled 2008-09-16 20:23 might just have a get_disk_block(file, offset) 2008-09-16 20:23 ext2 is happy to let the vfs take over completely here, but of course, the vfs will come back to ext2 at some point 2008-09-16 20:23 why not ext3? 2008-09-16 20:23 and allocate/free_disk_block 2008-09-16 20:23 we will get there in about 5-10 minutes 2008-09-16 20:24 ok 2008-09-16 20:24 for comparison, you could look at ext3/file.c 2008-09-16 20:24 let's do that later 2008-09-16 20:24 http://lxr.linux.no/linux+v2.6.26.5/+code=generic_file_aio_write 2008-09-16 20:24 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/file.c#L50 2008-09-16 20:24 ext2 is not journaled - so each file is just a read/write collection of blocks on disk 2008-09-16 20:25 even ext3 doesn't normally journal data 2008-09-16 20:25 so all you need is the ability to lookup a given files/offsets block location on disk and you can read/write just fine 2008-09-16 20:25 but it can... 2008-09-16 20:25 next step: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2364 2008-09-16 20:25 yes, and so it must supply different methods for its different journalling options 2008-09-16 20:26 http://lxr.linux.no/linux+v2.6.26.5/fs/ext3/file.c#L113 2008-09-16 20:26 not *must*, but that is what it does 2008-09-16 20:26 113 .aio_read = generic_file_aio_read, 2008-09-16 20:26 114 .aio_write = ext3_file_write, 2008-09-16 20:26 so ext3 has it's own write, but uses the generic read 2008-09-16 20:26 thanks razvanm 2008-09-16 20:26 notice that generic_file_aio_write didn't really do much 2008-09-16 20:27 generic read but custom write... interesting 2008-09-16 20:27 jsut took care of some options 2008-09-16 20:27 optional unix semantics 2008-09-16 20:27 razvanm, sure, no journal needed on read 2008-09-16 20:28 finally, __generic_file_aio_write_nolock is doing something 2008-09-16 20:28 not much... but more than the others 2008-09-16 20:28 aaaa... ext3 :D 2008-09-16 20:28 since on read you can just let the generic file/offset block lookup code handle it, but on write - you might need to go through the journal if the right mount optiones (data=ordered I think) were used 2008-09-16 20:28 or data=journaled - never sure 2008-09-16 20:28 here we see readv being implemented 2008-09-16 20:28 um 2008-09-16 20:28 writev 2008-09-16 20:29 where? 2008-09-16 20:29 generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ); 2008-09-16 20:29 nr_segs... writev segs 2008-09-16 20:29 not important 2008-09-16 20:29 easy enough to understand 2008-09-16 20:30 is that verifying we can read the ram the user passed us? 2008-09-16 20:30 probably 2008-09-16 20:30 let's find out 2008-09-16 20:30 1149 /* 2008-09-16 20:30 1150 * If any segment has a negative length, or the cumulative 2008-09-16 20:30 1151 * length ever wraps negative then return -EINVAL. 2008-09-16 20:30 1152 */ 2008-09-16 20:31 no, just checking for properly formed structs 2008-09-16 20:31 if (access_ok(access_flags, iv->iov_base, iv->iov_len)) 2008-09-16 20:31 I htink it does full access checks 2008-09-16 20:31 security 2008-09-16 20:31 note the return -EFAULT 2008-09-16 20:32 so we will rely on the mmu 2008-09-16 20:32 to fault 2008-09-16 20:32 and sometimes check for faulting contitions by hand 2008-09-16 20:32 http://lxr.linux.no/linux+v2.6.26.5/include/asm-m32r/uaccess.h#L108 <- access_ok just within memory or not 2008-09-16 20:32 no I think it checks by hand, but only returns EFAULT if first part is bad, otherwise it marks how many are good, and ignore the rest 2008-09-16 20:33 vfs_check_frozen implements the filesystem "freeze" feature... which is used for snapshotting 2008-09-16 20:33 kind of misconceived 2008-09-16 20:33 so you'll get a partial write instead of an EFAULT if you have a bad mapping in the middle of a writev 2008-09-16 20:33 sounds reasonable 2008-09-16 20:33 can't realy on mmu since we probably will use dma 2008-09-16 20:34 then we have a bunch of code associated with direct IO 2008-09-16 20:34 which we are going to skip 2008-09-16 20:34 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2319 2008-09-16 20:34 maze, true 2008-09-16 20:34 so we're going to check access somewhere 2008-09-16 20:34 but not here 2008-09-16 20:35 notice, no real work got done 2008-09-16 20:35 we're still just deepening the call chain and allowing for various options and whatnot 2008-09-16 20:35 at this point, we're seriously not expecting any real work to get done ;-) 2008-09-16 20:35 then we get to generic_file_buffered_write 2008-09-16 20:35 ACTION does! :D 2008-09-16 20:35 think that's going to do work? 2008-09-16 20:36 nope 2008-09-16 20:36 you'd be right 2008-09-16 20:36 short break 2008-09-16 20:36 while I fill the wine glass 2008-09-16 20:37 wine? i thought u wanted beer 2008-09-16 20:37 ;) 2008-09-16 20:37 nobody sent any 2008-09-16 20:37 aww 2008-09-16 20:37 ok here we go again 2008-09-16 20:38 ACTION thinks a_ops->write_begin must be the key... 2008-09-16 20:38 we have a ->write_begin option 2008-09-16 20:38 which is new for me 2008-09-16 20:38 the two functions are right next to each other 2008-09-16 20:38 and look similat 2008-09-16 20:38 and that 2copy thing, likewise 2008-09-16 20:38 probably something aio related 2008-09-16 20:38 looks like braindamange 2008-09-16 20:39 the 2copy is also using some a_ops 2008-09-16 20:39 notice a_ops 2008-09-16 20:39 is struct addres_space_operations 2008-09-16 20:40 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L444 2008-09-16 20:40 lost the scent for a moment 2008-09-16 20:40 ACTION knows readpage from romfs... 2008-09-16 20:40 sounds mmap-ish 2008-09-16 20:42 ACTION has to go to work :( 2008-09-16 20:42 ACTION says bbyee, do post the logs ... 2008-09-16 20:42 guessing a_ops are operations that can be performed on mmaped fs pages 2008-09-16 20:42 with ability for fs to override it to trigger journaling etc 2008-09-16 20:42 bye bye 2008-09-16 20:42 ok, this code has bben "worked on" 2008-09-16 20:42 rearranged hopefully for a good reason 2008-09-16 20:43 readpage is the only 'read' the romfs is doing 2008-09-16 20:43 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2231 2008-09-16 20:43 so its called not only for mmap stuff 2008-09-16 20:43 generic_perform_write 2008-09-16 20:43 that may be an optimization though 2008-09-16 20:43 this is where the real action happens 2008-09-16 20:43 who knows... 2008-09-16 20:43 or one form of real action 2008-09-16 20:43 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2231 2008-09-16 20:43 we're going to talk about a_ops 2008-09-16 20:44 this is the key to most filesystem io in linux 2008-09-16 20:45 ok, so here is a typical write mem 2008-09-16 20:45 write_begin, write_end 2008-09-16 20:45 right 2008-09-16 20:45 and in between we copy data from userspace 2008-09-16 20:45 onto a page 2008-09-16 20:45 copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); 2008-09-16 20:45 so what is in write_beging? probably get a page into the page cache of an inode 2008-09-16 20:46 and write_end will send that page down to the hardware 2008-09-16 20:46 looks like the kernel basically mmaps in the page and then mmaps it out 2008-09-16 20:46 copy_from_user gets the data, and generates EFAULT if necessary 2008-09-16 20:46 either because of illegal access, or page swapped out 2008-09-16 20:47 pagefault_disable(); 2008-09-16 20:47 uhm? 2008-09-16 20:47 things get interested in the page was swapped out to a swapfile onthe same filesystem 2008-09-16 20:47 interesting 2008-09-16 20:47 swapfile on the same filesystem?? 2008-09-16 20:47 right 2008-09-16 20:47 swapfile is not a separate fs? 2008-09-16 20:47 trying to prevent recursive fault 2008-09-16 20:47 sounds like that just turned off page-in 2008-09-16 20:47 I don't have the details at hand just now 2008-09-16 20:48 razvanm, swap can be separate, or it can be on a filesystem 2008-09-16 20:48 there are some nasty possible recursions when its on a filesystem 2008-09-16 20:48 very nasty 2008-09-16 20:48 ACTION doesn't know how to create a swap on a fs :| 2008-09-16 20:48 2 minutes until question time 2008-09-16 20:49 it's going to be another "cliffhanger" ending 2008-09-16 20:49 :-) 2008-09-16 20:49 lol 2008-09-16 20:49 now this function is not very instructive 2008-09-16 20:49 because it doesn't directly use the page cache ops 2008-09-16 20:49 it provides hooks for them 2008-09-16 20:49 are you sure we went into the right function? not the 2copy one? 2008-09-16 20:49 let's see if we can pop out and find a variant that does use the page cache ops 2008-09-16 20:50 I'm sure we didn't 2008-09-16 20:50 somebody has been messing with names 2008-09-16 20:50 I hope it was for a good reason 2008-09-16 20:50 it isn't always 2008-09-16 20:50 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2063 2008-09-16 20:50 and as you can see, the call chain is kind of unreasonably deep 2008-09-16 20:50 this all seems extremely complex 2008-09-16 20:51 for now I can't say unnecessarily... but... 2008-09-16 20:51 what does the 2copy mean? 2008-09-16 20:51 yes, this looks like what remains of good old generic_write 2008-09-16 20:51 it means brain-dead original 1st copy apparently 2008-09-16 20:51 maze, I am happy to have reached your "complex" threshold 2008-09-16 20:51 it gets more complex 2008-09-16 20:52 in _2copy, we will alloc pages, map them into a page cache, copy data onto them, and submit them to disk 2008-09-16 20:52 we will call the fs's ->write_page method to do the latter 2008-09-16 20:53 and that method will figure out _where_ on disk the page should go 2008-09-16 20:53 I don't know wyat 2copy means 2008-09-16 20:53 why do we have to copy_from_user 2008-09-16 20:53 can't we write directly from userspace data? 2008-09-16 20:53 feels like... wanking... but I will know for sure for thursdays's session 2008-09-16 20:54 maze, because this is _buffered_ write 2008-09-16 20:54 we are placing the data in cache 2008-09-16 20:54 oh, right 2008-09-16 20:54 we can't just place references to pages in cache 2008-09-16 20:54 because the user data is not necessarily properly aligned 2008-09-16 20:54 couldn't we just rip the page out from under the user, and give him a r/o cow page? 2008-09-16 20:54 linus does want to attempt something like that 2008-09-16 20:54 but it's too hard, even for him 2008-09-16 20:55 ACTION doesn't see the write_page.... 2008-09-16 20:55 me neither 2008-09-16 20:55 there is prepare_write 2008-09-16 20:55 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2192 2008-09-16 20:55 home is: see the writepage 2008-09-16 20:55 ;-) 2008-09-16 20:55 and commit_write 2008-09-16 20:55 on thursday we will pick up at the writepage 2008-09-16 20:56 I'm not sure why there would need to be a write page 2008-09-16 20:56 yep, it looks like _2copy really is the new incarnation of generic_write 2008-09-16 20:56 it used to just be generic_write 2008-09-16 20:56 but then it started getting more and more "wrapped" 2008-09-16 20:56 until we see this thing 2008-09-16 20:56 unreadable thing you could say 2008-09-16 20:56 :-) 2008-09-16 20:57 maze, the purpose of the ->writepages in there is to get dirty, buffered pages onto disk 2008-09-16 20:57 -!- kbingham(~kbingham@92.20.210.138) has joined #tux3 2008-09-16 20:57 won't commit_write do that? 2008-09-16 20:57 ah, that's what you asked 2008-09-16 20:57 why two 2008-09-16 20:57 no good reason actually 2008-09-16 20:58 there's usually a "prepare_write" and a "commit_write" 2008-09-16 20:58 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L458 2008-09-16 20:58 one or the other generally doesn't do much 2008-09-16 20:59 there's a writeage, writepages, prepatre_write,commit_write,write_begin,write_end ... 2008-09-16 20:59 pick'n'choose 2008-09-16 20:59 yes 2008-09-16 20:59 big mess 2008-09-16 20:59 linux IO is trying to find its identity 2008-09-16 20:59 lol 2008-09-16 21:00 it was simpler and nicer in the past? 2008-09-16 21:00 beginning of 2.6 was simpler, yes 2008-09-16 21:00 o_direct is a very good thing, but it added considerable complexity 2008-09-16 21:00 it looks like different file systems use different interfaces 2008-09-16 21:00 likewise aio 2008-09-16 21:01 maze, somewhat true 2008-09-16 21:01 almost everybody uses generic_write 2008-09-16 21:01 and thus we have a lot 2008-09-16 21:01 not much global structural analysis goes on 2008-09-16 21:01 so that the structure can be simplified 2008-09-16 21:01 because that doesn't add new features 2008-09-16 21:01 or fix bugs 2008-09-16 21:02 are the address_space_operations fs internal? 2008-09-16 21:02 introduces them more likely 2008-09-16 21:02 or are they more global mm? 2008-09-16 21:02 but it makes the code messy 2008-09-16 21:02 like many such things in linux, they are usually library methods 2008-09-16 21:02 kernel library 2008-09-16 21:02 which the fs can lightly wrap 2008-09-16 21:02 or use directly 2008-09-16 21:03 the ->writepages thing is a relatively new invention 2008-09-16 21:03 that allows the filesystem to map more than one page at a time for IO 2008-09-16 21:03 lead to nice benchmark improvements 2008-09-16 21:03 and more mess in filemap.c 2008-09-16 21:03 and this is where variable page sizes will get interesting 2008-09-16 21:04 filemap.c is where most of the impact is, yes 2008-09-16 21:04 insightful 2008-09-16 21:04 4 minutes over ;) 2008-09-16 21:04 how did we do for pacing today? 2008-09-16 21:04 i try 2008-09-16 21:04 nice pace 2008-09-16 21:04 pretty decent I think 2008-09-16 21:05 sorry I asked so many questions 2008-09-16 21:05 ok, we will be back into write on thursday 2008-09-16 21:05 ;-) 2008-09-16 21:05 tim_dimm: ask questions - it's the only way to learn anything 2008-09-16 21:05 ACTION is not happy with the length though ;-) 2008-09-16 21:05 homework is: find the implementations of the ->writepage calls in ext2 2008-09-16 21:05 I was just trying to figure out what / where to read 2008-09-16 21:05 never been inside the kernel like that before 2008-09-16 21:06 it's bizarre, isn't it 2008-09-16 21:06 yeah 2008-09-16 21:06 so here's a question: buffered, aio, o_direct - what are the permutations/combinations, what do they mean, and how do they interact with each other if the same spot is being accessed via different means 2008-09-16 21:06 maze, very good question, and the answer is: with considerable complexity 2008-09-16 21:06 lovely answer 2008-09-16 21:06 it is necessary to maintain cache consistency with all possible combinations 2008-09-16 21:07 that's like my friend at work, who sits next to me and regularly answers either/or questions with a 'yes' spoken in a deadpan voice 2008-09-16 21:07 are there hooks for cache consistency or is it handle another way? 2008-09-16 21:07 that is why that section handling o_direct that we skipped is so... um... interesting 2008-09-16 21:07 tim_dimm, the vfs handles it 2008-09-16 21:07 and there are rules that the fs has to follow 2008-09-16 21:07 O_DIRECT means unbuffered straight to disk, right? 2008-09-16 21:08 basically "do not skate over that cliff" 2008-09-16 21:08 and is pretty meaningless for read... 2008-09-16 21:08 maze, right 2008-09-16 21:08 o_direct write has to invalidate any buffer data at that point 2008-09-16 21:08 all synchronous io should be easily implementable via aio 2008-09-16 21:08 also flush out dirty buffered data in that range 2008-09-16 21:08 did you guys cover vfs on another tux3 night? 2008-09-16 21:08 maze, it is 2008-09-16 21:09 tim_dimm, partly 2008-09-16 21:09 this is part of the vfs we're doing now 2008-09-16 21:09 so you basically need to support {buffered | direct } asynchronous io 2008-09-16 21:09 would it be worthwhile to have an entire session on it?' 2008-09-16 21:09 we did an easy one first 2008-09-16 21:09 maze, yes 2008-09-16 21:09 in fact we already looked at the functions that support it 2008-09-16 21:10 tim_dimm, that was essentially the first session 2008-09-16 21:10 o_direct write has to invalidate any buffered data at that point - uh? 2008-09-16 21:10 k, I'll revisit in the logs 2008-09-16 21:10 maze, yes 2008-09-16 21:10 buffered data for what? 2008-09-16 21:10 somebody might have been reading/writing the device with buffered ops at the same time 2008-09-16 21:11 this is not uncommon 2008-09-16 21:11 oh, the buffered but not yet written stuff gets dropped? 2008-09-16 21:11 flushed to disk 2008-09-16 21:11 or overwritten with the - so flushed, not invalidated 2008-09-16 21:11 what gets invalidateD? 2008-09-16 21:11 you're right, fully replaced pages get dropped 2008-09-16 21:11 partially replaced pages have to be flushed 2008-09-16 21:12 so it's not so much invalidated, as overwritten and thus dropped/replaced with the new data 2008-09-16 21:12 right 2008-09-16 21:12 haven't spent a lot of time in that code myself 2008-09-16 21:12 but that's correct 2008-09-16 21:12 does O_DIRECT mean anything on read? 2008-09-16 21:12 yes 2008-09-16 21:13 will not read from buffer afaic 2008-09-16 21:13 Try to minimize cache effects of the I/O to and from this file 2008-09-16 21:13 but I could be wrong 2008-09-16 21:13 according to man open, basically skip buffer cache populating 2008-09-16 21:13 anything not buffered is read directly from disk and not added to the page cache 2008-09-16 21:13 unless already there 2008-09-16 21:13 so o_direct read avoids double buffering 2008-09-16 21:14 O_DIRECT (Since Linux 2.4.10) 2008-09-16 21:14 Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special 2008-09-16 21:14 situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is syn- 2008-09-16 21:14 chronous, that is, at the completion of a read(2) or write(2), data is guaranteed to have been transferred. See NOTES below for further 2008-09-16 21:14 discussion. 2008-09-16 21:14 A semantically similar (but deprecated) interface for block devices is described in raw(8). 2008-09-16 21:14 I'm not sure what it does with already-buffered data 2008-09-16 21:14 if dirty then it _must_ use the dirty version 2008-09-16 21:14 so, how expensive is a write to read only page fault? 2008-09-16 21:14 from man 2 open, sorry for the long lines 2008-09-16 21:14 but I don't know if it does that by flushing it first, then reading it back, or doing buffered read just for that bit 2008-09-16 21:14 yeah, found it 2008-09-16 21:14 doesn't look like there's any requirement to flush 2008-09-16 21:15 seems like O_DIRECT read is meant for access once - not worth caching - data 2008-09-16 21:15 yes 2008-09-16 21:15 still leaves the question about what it does with pages already in cache, or dirty in cache 2008-09-16 21:15 it says minimize 2008-09-16 21:15 shall we leave that as your homework? 2008-09-16 21:16 not ignore cache 2008-09-16 21:16 can't rely on the man page 2008-09-16 21:16 the pages should not be dirty for too long 2008-09-16 21:16 have to read the code 2008-09-16 21:16 :D 2008-09-16 21:17 from NOTES 2008-09-16 21:17 Applications should avoid mixing O_DIRECT and normal I/O to the same 2008-09-16 21:17 file, and especially to overlapping byte regions in the same file. 2008-09-16 21:17 Even when the filesystem correctly handles the coherency issues in this 2008-09-16 21:17 situation, overall I/O throughput is likely to be slower than using 2008-09-16 21:17 either mode alone. Likewise, applications should avoid mixing mmap(2) 2008-09-16 21:17 of files with direct I/O to the same files. 2008-09-16 21:17 one thing you see is that o_direct has to be constantly checking the page cache to be sure nothing is aliased there 2008-09-16 21:17 "The thing that has always disturbed me about O_DIRECT is that 2008-09-16 21:17 the whole interface is just stupid, and was probably designed by 2008-09-16 21:17 a deranged monkey on some serious mind-controlling substances." 2008-09-16 21:17 — Linus 2008-09-16 21:17 maze, the advice is often ignored 2008-09-16 21:18 linux is not absolved from responsibiltiy for keeping the cache consistent 2008-09-16 21:18 right 2008-09-16 21:18 linus doesn't run a database company 2008-09-16 21:18 lol 2008-09-16 21:18 which is why he thinks that 2008-09-16 21:18 the interface is quite simple 2008-09-16 21:19 open with o_direct, make sure your data is aligned 2008-09-16 21:20 hi all 2008-09-16 21:20 maze, how'd you do with reading your superblock 2008-09-16 21:20 hey 2008-09-16 21:20 shapor, right on time ;) 2008-09-16 21:20 I slept well, thank you ;-) 2008-09-16 21:20 good thing we have logs 2008-09-16 21:20 yeah 2008-09-16 21:20 reading now 2008-09-16 21:20 I'm going to be working on it now 2008-09-16 21:20 maze, that little subproject will be highly instructive 2008-09-16 21:21 agreed 2008-09-16 21:21 it already has been 2008-09-16 21:21 especially if you write your own custom endio 2008-09-16 21:21 and figure out how to have your task (which is "mount") wait on a wait queue for the io to complete 2008-09-16 21:21 exactly 2008-09-16 21:22 well, it's the in-kernel portion of mount 2008-09-16 21:22 it's all not very much code, but each line takes about 15 minutes of study 2008-09-16 21:22 or maybe an hour the first time 2008-09-16 21:22 I expect I need something, sleep on something, wake something from endio 2008-09-16 21:22 precisely 2008-09-16 21:22 apparently something called a waitqueue 2008-09-16 21:23 the waiting bits are covered in a nice tutorial manner on lwn 2008-09-16 21:23 so probably something like a dynamic init of a waitqueue 2008-09-16 21:23 ACTION is off to bed. Tomorrow he needs to be early at school. 2008-09-16 21:23 then submit io 2008-09-16 21:23 bio is... an acquired taste 2008-09-16 21:23 then sleep on wq 2008-09-16 21:23 acquired ore 2008-09-16 21:23 acquired lore 2008-09-16 21:23 in endio wake wq 2008-09-16 21:23 more like acquired love 2008-09-16 21:23 exactly 2008-09-16 21:23 probably using the "wake" function 2008-09-16 21:24 that sounds awesome 2008-09-16 21:24 and either wake or wakeall likely 2008-09-16 21:24 here wakeall being more appropriate 2008-09-16 21:24 usually wake 2008-09-16 21:24 no need for a thundering herd 2008-09-16 21:24 of course you know there is only one waiter 2008-09-16 21:25 there better not be more, or something else broke 2008-09-16 21:25 well, but in general, since the op is complete - I should wake all 2008-09-16 21:25 interesting question then is how to dealloc the wq 2008-09-16 21:25 must be some put_wq in the waiters 2008-09-16 21:25 which on last dec to zero does free 2008-09-16 21:25 next move for me is to drop over to whole foods to pick up some munchies 2008-09-16 21:26 I only have a few more days left as a bachelor 2008-09-16 21:26 before the girls get back ;) 2008-09-16 21:26 flips: hah thats where i was instead of class 2008-09-16 21:26 at which time I'm afraid my checking rate will drop somewhat 2008-09-16 21:26 linux/wait.h 2008-09-16 21:26 didn't think it'd be so early 2008-09-16 21:26 checkin 2008-09-16 21:26 shapor, 8 pm tue and thur 2008-09-16 21:28 hmm 2008-09-16 21:28 looks like it's too late for whole food 2008-09-16 21:28 unless I really run 2008-09-16 21:28 don't feel like really running 2008-09-16 21:28 maybe it's 3rd street for dinner tonight 2008-09-16 21:28 so i need to make a dynamic wq, init with init_waitqueue_head() 2008-09-16 21:28 yes, and there are various convenience wrappers 2008-09-16 21:29 best is to write it on the metal the first time 2008-09-16 21:30 #define wake_up_all(x) __wake_up(x, TASK_NORMAL, 0, NULL) 2008-09-16 21:30 well if I don't go shopping there will be no coffee for breakfast 2008-09-16 21:30 seems to be the way to wake 2008-09-16 21:30 so I'm gone... 2008-09-16 21:30 folks 2008-09-16 21:31 hi bh 2008-09-16 21:32 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has left #tux3 2008-09-16 21:32 -!- cdk(~chinmay@121.246.33.227) has joined #tux3 2008-09-16 21:34 interesting 2008-09-16 21:34 how did I become a contributor on zumastor? 2008-09-16 21:35 aposter 2008-09-16 21:35 ah 2008-09-16 21:35 u do that? 2008-09-16 21:37 flips:are the latest tuxfs binaries working fine for everyone? 2008-09-16 21:38 i am getting segfaults for each file that i copy 2008-09-16 21:38 sync rootdir 2008-09-16 21:38 filemap_blockio: write 2008-09-16 21:38 devmap_blockio: read [8] 2008-09-16 21:38 devmap_blockio: read [9] 2008-09-16 21:38 balloc -> [10] 2008-09-16 21:38 new group at 0 2008-09-16 21:38 insert 0x0 at 0 in group 0 2008-09-16 21:38 limit = 0, free = 4088 2008-09-16 21:38 save_inode: save inode 0xd 2008-09-16 21:39 lookup inode 0xd, 0 + d 2008-09-16 21:39 resize inum 0xd at 0x58 from 18 to 28 2008-09-16 21:39 sync atom table 2008-09-16 21:39 Segmentation fault 2008-09-16 21:41 thats the inode.c right? 2008-09-16 21:41 or is that tux3 fuse? 2008-09-16 21:42 tux3 fuse running in the foreground 2008-09-16 21:42 let me try to reproduce 2008-09-16 21:43 did you try running under gdb? 2008-09-16 21:43 no .. that i did not 2008-09-16 21:43 probably have to attach to it after you start it i haven't tried yet 2008-09-16 21:45 cdk: you're running tux3fs right? 2008-09-16 21:45 not tux3fuse 2008-09-16 21:45 yeah tux3fs 2008-09-16 21:46 ah yes, happens for me too 2008-09-16 21:46 i am sure it worked before... 2008-09-16 21:47 i mean two days ago 2008-09-16 21:48 yeah appears to be on write 2008-09-16 21:48 new bug 2008-09-16 21:49 not sure why its sync'ing atom table at all 2008-09-16 21:49 hrm 2008-09-16 21:50 btw...ls is yet to work for tux3fuse isnt it? 2008-09-16 21:50 yeah i think tux3fuse is very broken 2008-09-16 21:51 but perhaps a better approach 2008-09-16 21:51 using the "low level" api 2008-09-16 21:51 yes. 2008-09-16 21:54 anyways...i need to go...will keep track of the changes. 2008-09-16 21:54 hopefully we resolve this soon. 2008-09-16 21:54 cdk: thanks for the bug report... i'm on it 2008-09-16 21:54 i think it see whats wrong 2008-09-16 22:00 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-16 22:00 yes 2008-09-16 22:01 kernel locked ;-) 2008-09-16 22:09 heh 2008-09-16 22:09 fun 2008-09-16 22:09 whatd you do 2008-09-16 22:10 cdk: fixed :) 2008-09-16 22:11 oh hes gone 2008-09-16 22:11 cdk, no doubt it's my fault 2008-09-16 22:11 I didn't try it 2008-09-16 22:12 probably have to attach to it after you start it i haven't tried yet <- or just change the makefile 2008-09-16 22:13 segfault in atom stuff... no surprise 2008-09-16 22:13 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-16 22:23 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-16 22:24 back from kernel lala land 2008-09-16 22:24 hmm 2008-09-16 22:24 hello all 2008-09-16 22:25 maze, log of tux univ posted? 2008-09-16 22:25 yes 2008-09-16 22:26 http://shapor.com/tux3/irclogs/current.txt 2008-09-16 22:26 okay, looks like I definitely need to put a little effort into making my system more debuggable 2008-09-16 22:27 that was a totally harmless piece of code - one'd think 2008-09-16 22:29 uh 2008-09-16 22:29 ugh 2008-09-16 22:29 stupid thing 2008-09-16 22:29 this one takes an object, that one takes a pointer to an object... 2008-09-16 22:29 and they're all macros, so who'd guess 2008-09-16 22:30 okay, so that actually works 2008-09-16 22:30 waits the appropriate number of jiffies 2008-09-16 22:31 i'm going to split up those logs soon 2008-09-16 22:31 gotta make a cron job 2008-09-16 22:31 that way we can link to TuxU sessions 2008-09-16 22:31 by day? by week? by month? 2008-09-16 22:32 not sure yet 2008-09-16 22:32 i have this script http://zumastor.org/irclogs/ 2008-09-16 22:32 leave current the way it is 2008-09-16 22:32 yeah i like the one big long 2008-09-16 22:32 log 2008-09-16 22:32 easy to grep ;) 2008-09-16 22:32 would be nice to hit record and stop for tux3 U 2008-09-16 22:32 lol 2008-09-16 22:32 I love the most recent conversation 2008-09-16 22:33 #zumastor doesn't get a lot of traffic these days 2008-09-16 22:33 wonder why 2008-09-16 22:33 what's up with zumastor? 2008-09-16 22:33 not much these days 2008-09-16 22:34 no one really works on it anymore 2008-09-16 22:34 its waiting for tux3 goodness to be backported 2008-09-16 22:34 really? 2008-09-16 22:34 to increase performance 2008-09-16 22:34 yup 2008-09-16 22:35 I guess that's kind of sad 2008-09-16 23:55 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-17 00:23 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-17 00:31 -!- tim_vimm(~Tim@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-17 00:40 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-17 00:44 ok, I have a basic idea how I'm going to make extent creation happen in inode.c 2008-09-17 00:44 may slightly depart from tradition and not write a design note first 2008-09-17 00:45 -!- tim_vimm(~Tim@cpe-76-90-98-247.socal.res.rr.com) has left #tux3 2008-09-17 00:45 so its a surprise? 2008-09-17 01:07 http://www.phoronix.com/scan.php?page=news_item&px=NjcyNQ 2008-09-17 01:08 "An Update On The Tux3 File-System" 2008-09-17 01:08 very nice little news piece 2008-09-17 01:09 oh, and "Tux3 Report" is somehow got to #1 on the lkml.org hot list 2008-09-17 01:09 the next three posts are linus, then alan cox, then "time travel", then the original Tux3 announcement 2008-09-17 01:19 http://lwn.net/Articles/296568/ 2008-09-17 01:19 some comments there 2008-09-17 01:19 regarding the namespace 2008-09-17 01:20 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-17 01:20 geeks firing on random mode 2008-09-17 01:20 doesn't mean a thing until kernel untar works 2008-09-17 01:21 don't see anything on the namespace 2008-09-17 01:21 oh 2008-09-17 01:21 right 2008-09-17 01:21 "taking back teh tux" 2008-09-17 01:22 http://www.hpcwire.com/features/Cray_Unveils_Personal_Supercomputer.html 2008-09-17 01:22 noticed that one 2008-09-17 01:22 msft involvement 2008-09-17 01:22 got to be a disaster 2008-09-17 01:22 ack 2008-09-17 01:22 trying to find a higher waste-to-recovery ratio than even xbox I think 2008-09-17 01:23 tim_dimm, shap & I had a nice midnight skate, sorry we forgot to ping you 2008-09-17 01:24 don't know how we overlooked that 2008-09-17 01:24 it won't happen again 2008-09-17 01:24 dang- I would have rolled 2008-09-17 01:24 I know 2008-09-17 01:24 ACTION kicks /me 2008-09-17 01:24 ACTION agrees 2008-09-17 01:25 well I drown my sorrows in a glass of cabernet 2008-09-17 01:25 and see if I can get some progress on extent writing 2008-09-17 01:25 I spent the evening hanging shelves 2008-09-17 01:26 that's fun too 2008-09-17 01:26 loads 2008-09-17 01:26 more fun than watching paint dry 2008-09-17 01:26 -!- ChanServ changed mode/#tux3 -> +o shapor 2008-09-17 01:26 uhoh 2008-09-17 01:27 that set off all kinds of alarms 2008-09-17 01:27 ACTION aims ops priv 2008-09-17 01:27 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-17 01:28 -!- flips changed mode/#tux3 -> +o tim_dimm 2008-09-17 01:28 -!- flips changed mode/#tux3 -> +o konrad 2008-09-17 01:28 smooth operator 2008-09-17 01:28 now u r 1 2 2008-09-17 01:29 -!- flips changed mode/#tux3 -> +o tux3bot 2008-09-17 01:29 bot can kick now 2008-09-17 01:29 what you call a kickass bot 2008-09-17 01:29 hah 2008-09-17 01:29 kickbut_bot 2008-09-17 01:29 i dont think the tux has the code to do that 2008-09-17 01:30 lets find out 2008-09-17 01:32 folks 2008-09-17 01:33 <- crashes 2008-09-17 01:45 lol 2008-09-17 01:46 lots of sheriffs around now 2008-09-17 01:46 this must be the safest place on the planet 2008-09-17 01:46 ? 2008-09-17 01:46 you causing trouble again? 2008-09-17 01:46 oh 2008-09-17 01:46 all the ops 2008-09-17 01:47 -!- flips changed mode/#tux3 -> +o MaZe 2008-09-17 01:47 just don't make me use the kickbot 2008-09-17 01:48 lol 2008-09-17 01:48 I don't even know how to use *o* powers 2008-09-17 01:48 lots of chiefs not enough indians 2008-09-17 01:48 ACTION just barely refrains from kicking maze to communicate the concept 2008-09-17 01:48 I got waitqueues working 2008-09-17 01:48 flips: how's it going ? 2008-09-17 01:48 -!- flips changed mode/#tux3 -> -o flips 2008-09-17 01:48 there we are 2008-09-17 01:49 hi bh 2008-09-17 01:49 also found a bio_kern_map func 2008-09-17 01:49 pretty good, extents coming into focus 2008-09-17 01:49 yeah, I was reading something about it from your adventures with btrfs 2008-09-17 01:49 sounds... um... map what? 2008-09-17 01:49 erm, bio_map_kern 2008-09-17 01:49 apparently takes kernel data ptr and returns a bio 2008-09-17 01:50 sounds really automagic 2008-09-17 01:50 except no-one seems to use it... 2008-09-17 01:51 http://lxr.linux.no/linux+v2.6.26.5/fs/bio.c#L922 2008-09-17 01:51 add_pc_page... still don't know what pc stands for 2008-09-17 01:52 looks like a good exercise in taking something simple and making it look complex 2008-09-17 01:53 yes, well not sure what the extra pc means 2008-09-17 01:53 all that request queue passing looks doubtful 2008-09-17 01:53 what's it for? 2008-09-17 01:53 looks less than clean 2008-09-17 01:53 a lot of the bio code is like that 2008-09-17 01:53 http://lxr.linux.no/linux+v2.6.26.5/block/blk-map.c#L284 2008-09-17 01:53 seems to be the only place its used 2008-09-17 01:54 is this entire thing really such spaghetti? 2008-09-17 01:54 well, the request_queue, is apparently something you can pull out of the bio 2008-09-17 01:54 it's bogus to say "map kern" 2008-09-17 01:54 bio by default references kernel memory 2008-09-17 01:55 it's essentially just a vector of page headers 2008-09-17 01:55 the entire thing is pretty much that spagetti like or worse 2008-09-17 01:55 looks superficially plausible 2008-09-17 01:55 is mostly fluff when you dig 2008-09-17 01:56 the more I read this the more scared I am of running linux 2008-09-17 01:56 anyway, you now have officially arrived at the underbelly of linux 2008-09-17 01:56 few kernel hacks even look at this stuff 2008-09-17 01:56 I'd almost switch to windows... except better the devil you know, then devil you don't ;-) 2008-09-17 01:57 hah 2008-09-17 01:57 windows is worse with a very high degree of probability 2008-09-17 01:57 yeah, pretty sure of that 2008-09-17 01:57 bio is fast 2008-09-17 01:57 that's the redeeming thing 2008-09-17 01:57 I'm actually most annoyed about the complete lack of useful documentation 2008-09-17 01:58 yes, especially here 2008-09-17 01:58 nobody documents bios, it's new 2008-09-17 01:58 http://kerneltrap.org/man/linux/9?page=4 2008-09-17 01:58 have to let it age a bit first 2008-09-17 01:59 anyway, just write your root loader 2008-09-17 01:59 super loader 2008-09-17 01:59 then I'll kick sand at it ;) 2008-09-17 01:59 yeah, yeah, I hate writing without understanding 2008-09-17 02:00 bio is just a handle for a biovec which is just a vector of page heads with a short offset and length of data on each one 2008-09-17 02:00 it transfers to a _contiguous_ physical region 2008-09-17 02:00 right 2008-09-17 02:00 to or from 2008-09-17 02:00 the memory side can be completely discontiguous 2008-09-17 02:00 very useful 2008-09-17 02:00 it's physically contiguous? 2008-09-17 02:00 on disk it is 2008-09-17 02:00 as in on disk 2008-09-17 02:00 right 2008-09-17 02:01 not memory 2008-09-17 02:01 that's the most important aspect of the api 2008-09-17 02:01 it's just a preadv / pwritev 2008-09-17 02:01 there is tons of cruft you can ignore connected with queueing, elevatoring, and mapping bio to dma 2008-09-17 02:01 just ignore it 2008-09-17 02:01 you only care about the length field, sector address, count of bvecs, couple of other things 2008-09-17 02:02 transfer direction 2008-09-17 02:02 list is getting short 2008-09-17 02:02 endio 2008-09-17 02:02 private field 2008-09-17 02:02 fill in the fields, submit your bio, wait fot the computer to catch fire 2008-09-17 02:03 right, except bio_add_page takes pages, and I'm still not to clear on kaddr -> page conversion 2008-09-17 02:03 so I'm trying to parse that 2008-09-17 02:03 forget that 2008-09-17 02:04 just set the bvec fields yourself 2008-09-17 02:04 you only need to "map" a page in kernel if you're going to play with the data on it 2008-09-17 02:04 that's an advantage of using buffers 2008-09-17 02:04 they're always in kernel memory 2008-09-17 02:04 but 2008-09-17 02:04 you get to set up this bio 2008-09-17 02:04 virt_to_page(data) 2008-09-17 02:04 meaning you can allocate the page it's going to read the super into 2008-09-17 02:04 offset_in_page(kaddr); 2008-09-17 02:05 seem to be relevant 2008-09-17 02:05 and make that a kernel page 2008-09-17 02:05 so you don't have to "map" it 2008-09-17 02:05 you can already address it 2008-09-17 02:05 right, so I have a kmalloc 2008-09-17 02:05 which gives me a void * kaddr 2008-09-17 02:05 offset_in_page isn't anything I've used 2008-09-17 02:05 sounds like some more wanabe api 2008-09-17 02:06 so you're saying to literally fill in all the bio fields by hand? seems terrible 2008-09-17 02:06 not kmalloc, you want alloc_pages 2008-09-17 02:06 order 0 2008-09-17 02:06 = one page 2008-09-17 02:06 you'll write a helper 2008-09-17 02:06 just like everybody does 2008-09-17 02:06 nope 2008-09-17 02:06 and everybody writes a crappy helper that nobody else wants to use ;) 2008-09-17 02:06 why not kmalloc? I don't need a full page. 2008-09-17 02:07 I should be fine with kmalloc 2008-09-17 02:07 you can't store it in the bvec is why 2008-09-17 02:07 and then passing the converted - it'll work 2008-09-17 02:07 you need a _page_ 2008-09-17 02:07 I see it 2008-09-17 02:07 won't work 2008-09-17 02:07 bvecs point at struct pages 2008-09-17 02:07 don't be shy about taking a full page to read the superblock 2008-09-17 02:08 it's a tiny blip in terms of kernel memory wastage 2008-09-17 02:08 eh, it compiles 2008-09-17 02:08 it'll work - I'm stubborn 2008-09-17 02:08 kay, I'll wait for the code 2008-09-17 02:09 ;-) 2008-09-17 02:12 once you have done this you have figured out a huge part of the kernel io system 2008-09-17 02:13 it's actually simple, just wrapped in layers of crud to make it look complex 2008-09-17 02:19 hmm, well I have something which actually might work 2008-09-17 02:19 now to reread the code and then test it 2008-09-17 02:23 agh, lets test it live 2008-09-17 02:33 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-17 02:33 hmm 2008-09-17 02:33 computer caught fire? 2008-09-17 02:33 not quite 2008-09-17 02:34 it didn't quite go away 2008-09-17 02:34 and I think it might have done something right 2008-09-17 02:34 but a reboot was needed 2008-09-17 02:34 no more testing live - not worth it 2008-09-17 02:34 right 2008-09-17 02:34 qemu or uml 2008-09-17 02:34 you got uml running didn't you? 2008-09-17 02:34 ACTION forgets 2008-09-17 02:35 Sep 17 02:25:00 nike kernel: loop: module loaded 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0000: FA EB 21 5B 4D 61 5A 65 42 6F 6F 74 5D 00 60 B4 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0010: 0E BB 07 00 89 E5 8B 76 10 FF 46 10 8A 04 FE 04 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0020: CD 10 61 C3 31 C0 8E D0 BC 00 7C FB E8 DF FF 2A 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0030: 8E D8 89 E6 06 8E C0 57 BF 00 06 FC B9 00 01 F3 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0040: A5 EA 57 06 00 00 56 BB 07 00 B4 0E CD 10 5E AC 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0050: 08 C0 75 F2 F4 EB FD E8 B4 FF 5B 89 C5 BF BE 07 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0060: B1 04 E8 A9 FF 31 80 3D 80 75 0B 09 ED BE 22 07 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0070: 75 DD 89 FD EB 08 80 3D 00 BE 3F 07 75 D1 83 C7 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0080: 10 E2 DF E8 88 FF 5D 09 ED BE 5A 07 74 C1 E8 7D 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 0090: FF 76 BF 05 00 B4 41 BB AA 55 CD 13 72 33 81 FB 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00A0: 55 AA 75 2D F6 C1 01 74 28 E8 62 FF 4C 8B 76 08 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00B0: 89 36 1A 07 E8 57 FF 31 BE 12 07 B4 42 CD 13 BE 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00C0: 97 07 73 33 31 C0 CD 13 4F 75 E9 BE 71 07 E9 7E 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00D0: FF 8A 76 01 8B 4E 02 E8 34 FF 31 BB 00 7C B8 01 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00E0: 02 57 CD 13 5F BE 9C 07 73 0D 31 C0 CD 13 4F 75 2008-09-17 02:35 Sep 17 02:25:15 nike kernel: 00F0: E6 BE 76 07 E9 58 FF E8 14 FF 3D 81 3E FE 7D 55 2008-09-17 02:35 qemu 2008-09-17 02:35 stil need to craft a better rootfs for it though 2008-09-17 02:36 so, was that a successful read or is that cruft? 2008-09-17 02:36 looks rather crufty 2008-09-17 02:36 0000000: faeb 215b 4d61 5a65 426f 6f74 5d00 60b4 ..![MaZeBoot].`. 2008-09-17 02:36 0000010: 0ebb 0700 89e5 8b76 10ff 4610 8a04 fe04 .......v..F..... 2008-09-17 02:36 0000020: cd10 61c3 31c0 8ed0 bc00 7cfb e8df ff2a ..a.1.....|....* 2008-09-17 02:36 0000030: 8ed8 89e6 068e c057 bf00 06fc b900 01f3 .......W........ 2008-09-17 02:36 0000040: a5ea 5706 0000 56bb 0700 b40e cd10 5eac ..W...V.......^. 2008-09-17 02:36 0000050: 08c0 75f2 f4eb fde8 b4ff 5b89 c5bf be07 ..u.......[..... 2008-09-17 02:36 0000060: b104 e8a9 ff31 803d 8075 0b09 edbe 2207 .....1.=.u....". 2008-09-17 02:36 0000070: 75dd 89fd eb08 803d 00be 3f07 75d1 83c7 u......=..?.u... 2008-09-17 02:36 0000080: 10e2 dfe8 88ff 5d09 edbe 5a07 74c1 e87d ......]...Z.t..} 2008-09-17 02:36 0000090: ff76 bf05 00b4 41bb aa55 cd13 7233 81fb .v....A..U..r3.. 2008-09-17 02:36 00000a0: 55aa 752d f6c1 0174 28e8 62ff 4c8b 7608 U.u-...t(.b.L.v. 2008-09-17 02:36 00000b0: 8936 1a07 e857 ff31 be12 07b4 42cd 13be .6...W.1....B... 2008-09-17 02:36 00000c0: 9707 7333 31c0 cd13 4f75 e9be 7107 e97e ..s31...Ou..q..~ 2008-09-17 02:36 00000d0: ff8a 7601 8b4e 02e8 34ff 31bb 007c b801 ..v..N..4.1..|.. 2008-09-17 02:36 00000e0: 0257 cd13 5fbe 9c07 730d 31c0 cd13 4f75 .W.._...s.1...Ou 2008-09-17 02:36 00000f0: e6be 7607 e958 ffe8 14ff 3d81 3efe 7d55 ..v..X....=.>.}U 2008-09-17 02:36 it worked! 2008-09-17 02:36 ooh, Mazeboot 2008-09-17 02:36 so it did perform the read from loop 2008-09-17 02:37 that's a hand crafted lba capable boot sector 2008-09-17 02:37 that I should get around to sending to hpa 2008-09-17 02:37 hpa? 2008-09-17 02:37 hpa@zytor.com 2008-09-17 02:37 what I thought 2008-09-17 02:37 I think is his nick, he's syslinux guy 2008-09-17 02:37 oh yes 2008-09-17 02:38 anyway, so the bio part mostly worked 2008-09-17 02:38 it must be a very special boot sector 2008-09-17 02:38 most likely locking got messed up somewhere 2008-09-17 02:38 or kfree happened to quickly 2008-09-17 02:38 you should post code around now 2008-09-17 02:38 give it a little more fiddling, then post 2008-09-17 02:38 hmm 2008-09-17 02:38 I'll post now 2008-09-17 02:38 :) 2008-09-17 02:38 since it'll take a while to get a rootfs 2008-09-17 02:39 static int junkfs_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, struct vfsmount *mnt) 2008-09-17 02:39 { 2008-09-17 02:39 <------>return get_sb_bdev(fs_type, flags, dev_name, data, junkfs_fill_super, mnt); 2008-09-17 02:39 } 2008-09-17 02:39 standard stuff 2008-09-17 02:39 i'll skip all the other stuff 2008-09-17 02:39 the important stuff is all in junkfs_fill_super 2008-09-17 02:39 ah, I thought this was your bio thing 2008-09-17 02:40 well 2008-09-17 02:40 I guess it is 2008-09-17 02:40 struct mz_t { 2008-09-17 02:40 <------>wait_queue_head_t wq; 2008-09-17 02:40 <------>int completed; 2008-09-17 02:40 }; 2008-09-17 02:40 nice and simple 2008-09-17 02:40 the struct to stick in the bio to wait on and mark completion 2008-09-17 02:40 static void end_io_read(struct bio *bio, int err) 2008-09-17 02:40 { 2008-09-17 02:40 <------>struct mz_t * mzp; 2008-09-17 02:40 <------>DBG_ENTER0(); 2008-09-17 02:40 <------>mzp = (struct mz_t *)bio->bi_private; 2008-09-17 02:40 <------>mzp->completed = 1; 2008-09-17 02:40 <------>bio_put(bio); 2008-09-17 02:40 you can post to tux3 list 2008-09-17 02:40 <------>wake_up(&mzp->wq); 2008-09-17 02:40 <------>DBG_RETURN0(); 2008-09-17 02:40 } 2008-09-17 02:40 don't be shy :) 2008-09-17 02:40 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-17 02:40 the end completion 2008-09-17 02:41 I'm shy. 2008-09-17 02:41 yes, good 2008-09-17 02:41 don't be 2008-09-17 02:41 I want to get a non-crashing version first 2008-09-17 02:41 optional, but if that's your fav... 2008-09-17 02:41 anyway the above just grabs the private part, marks it as completed, and wakes up the wq 2008-09-17 02:41 yes 2008-09-17 02:41 the bio_put might be the problem... 2008-09-17 02:41 it's right 2008-09-17 02:41 um 2008-09-17 02:41 might not, not sure 2008-09-17 02:42 should be ok 2008-09-17 02:42 #define SB_SIZE 512 2008-09-17 02:42 static int junkfs_fill_super(struct super_block *sb, void *data, int silent) 2008-09-17 02:42 { 2008-09-17 02:42 <------>struct mz_t *mz; 2008-09-17 02:42 <------>u8 *buf; 2008-09-17 02:42 <------>struct bio *bio; 2008-09-17 02:42 <------>int err; 2008-09-17 02:42 <------>int s, y, x; 2008-09-17 02:42 <------>DBG_ENTER0(); 2008-09-17 02:42 <------>mz = kmalloc(sizeof(struct mz_t), GFP_KERNEL); 2008-09-17 02:42 <------>if (IS_ERR(mz)) { 2008-09-17 02:42 <------><------>err = PTR_ERR(mz); 2008-09-17 02:42 <------><------>goto out_mz_null; 2008-09-17 02:42 <------>}; 2008-09-17 02:42 <------>init_waitqueue_head(&mz->wq); 2008-09-17 02:42 <------>mz->completed = 0; 2008-09-17 02:42 <------>buf = kmalloc(SB_SIZE, GFP_KERNEL); 2008-09-17 02:42 <------>if (IS_ERR(buf)) { 2008-09-17 02:42 <------><------>err = PTR_ERR(buf); 2008-09-17 02:42 <------><------>goto out_sb_null; 2008-09-17 02:42 <------>}; 2008-09-17 02:42 <------>bio = bio_alloc(GFP_KERNEL, 1); 2008-09-17 02:42 <------>if (IS_ERR(bio)) { 2008-09-17 02:42 <------><------>err = PTR_ERR(bio); 2008-09-17 02:42 <------><------>goto out_bio_null; 2008-09-17 02:42 <------>}; 2008-09-17 02:42 then we have the last function remaining - still very crufty 2008-09-17 02:42 basically, up till this point it's all kmalloc's 2008-09-17 02:43 <------>bio->bi_bdev = sb->s_bdev; 2008-09-17 02:43 <------>bio->bi_sector = 0; // first sector 2008-09-17 02:43 what we actually want to read - from sector 0 on our bdev 2008-09-17 02:43 <------>if (bio_add_page(bio, virt_to_page(buf), SB_SIZE, offset_in_page(buf)) == SB_SIZE) { 2008-09-17 02:43 <------><------>bio->bi_end_io = end_io_read; 2008-09-17 02:43 <------><------>bio->bi_private = mz; 2008-09-17 02:43 <------><------>submit_bio(READ, bio); 2008-09-17 02:43 <------><------>s = wait_event_interruptible(mz->wq, mz->completed); 2008-09-17 02:43 <------><------>DBG_MARK1(int, s); 2008-09-17 02:43 <------><------>for (y = 0; y < 16; ++y) { 2008-09-17 02:43 <------><------><------>printk(KERN_INFO "%04X:", y * 16); 2008-09-17 02:43 <------><------><------>for(x = 0; x < 16; ++x) { 2008-09-17 02:43 <------><------><------><------>printk(" %02X", buf[y * 16 + x]); 2008-09-17 02:43 <------><------><------>}; 2008-09-17 02:43 <------><------><------>printk("\n"); 2008-09-17 02:43 <------><------>}; 2008-09-17 02:43 <------>} else { 2008-09-17 02:43 <------><------>DBG_MARK0(); 2008-09-17 02:43 <------>}; 2008-09-17 02:43 <------>err = -1; 2008-09-17 02:44 there's the actual read, wait on wq, dump to dmesg 2008-09-17 02:44 ah, you did virt_to_page, that's how it worked 2008-09-17 02:44 forget that 2008-09-17 02:44 obviously the dump happened, and already had the proper content 2008-09-17 02:44 just to page = alloc_pages 2008-09-17 02:44 and use the page head directly, like all the other hacks 2008-09-17 02:44 kmallocing that will get you shouted at, trust me 2008-09-17 02:44 this is cleaner, I'm actually allocating exactly how much I need 2008-09-17 02:44 nope 2008-09-17 02:44 it's not good 2008-09-17 02:45 well 2008-09-17 02:45 unless you have an _actual_ other user of the page 2008-09-17 02:45 false economy 2008-09-17 02:45 so, you're saying grabing an empty page is cheaper? especially since it'll be returned soon anyway? 2008-09-17 02:45 it's very cheap 2008-09-17 02:45 <------>kfree(bio); 2008-09-17 02:45 <------>bio = NULL; 2008-09-17 02:45 out_bio_null: 2008-09-17 02:45 <------>kfree(sb); 2008-09-17 02:45 <------>sb = NULL; 2008-09-17 02:45 out_sb_null: 2008-09-17 02:45 <------>kfree(mz); 2008-09-17 02:45 and it's the superblock 2008-09-17 02:45 <------>mz = NULL; 2008-09-17 02:45 out_mz_null: 2008-09-17 02:46 <------>DBG_RETURN1(int, err); 2008-09-17 02:46 } 2008-09-17 02:46 and that's it 2008-09-17 02:46 it deserves a page of its own 2008-09-17 02:46 and this apparently reads and dumps correctly, but still has some nasty bug in it 2008-09-17 02:46 anyway, it looks good 2008-09-17 02:46 nothing to be shy about 2008-09-17 02:46 maybe the kfree(bio)? 2008-09-17 02:46 you can post that to the tux3 list 2008-09-17 02:46 yep 2008-09-17 02:46 don't do that 2008-09-17 02:46 you need to give that bio back to the bio system 2008-09-17 02:46 since bio_put already freed it? 2008-09-17 02:46 via bio_put 2008-09-17 02:46 right 2008-09-17 02:47 so I need to bio_put on the error path 2008-09-17 02:47 you might want to put some trace output in bio_put 2008-09-17 02:47 not on the normal return path? 2008-09-17 02:47 bio_endio should put the bio 2008-09-17 02:47 I forget 2008-09-17 02:47 very forgettable detail 2008-09-17 02:47 you need to look at the source 2008-09-17 02:48 well another endio func did bio_put 2008-09-17 02:48 hence I did as well 2008-09-17 02:48 there's a bio error mechanism too 2008-09-17 02:48 so kfree(bio) is the problem 2008-09-17 02:48 you can endio when you have an error 2008-09-17 02:48 and it will put the bio 2008-09-17 02:48 don't need to do it on a separate path 2008-09-17 02:48 right endio(bio,err) 2008-09-17 02:49 the big deal is, your wait queue and wake worked 2008-09-17 02:49 that's fun, hmm? 2008-09-17 02:49 powerful 2008-09-17 02:49 oh the wq was easy 2008-09-17 02:50 the only problem with wq, was the sometimes need for & or * 2008-09-17 02:52 I'm assuming writes would be just as simple 2008-09-17 02:54 yes 2008-09-17 02:54 it's symmetric 2008-09-17 02:59 anyway, once you have tracked down your double free I encourage you to post it to the tux3 list 2008-09-17 02:59 it's especially interesting now while it's still minimal 2008-09-17 03:01 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-17 03:01 argh 2008-09-17 03:01 okay so that kfree ain't the only problem in there 2008-09-17 03:01 now, really no more live testing 2008-09-17 03:02 anyway, once you have tracked down your double free I encourage you to post it to the tux3 list 2008-09-17 03:02 it's especially interesting now while it's still minimal 2008-09-17 03:02 will do so 2008-09-17 03:02 there's still a lot of stuff in there I'm not quite clear on 2008-09-17 03:03 so I think I'll devote a little more time into understanding what all these functions _do_... 2008-09-17 03:04 anyway, these bio's seem to be usable for aio pretty nicely 2008-09-17 03:04 inherently aio 2008-09-17 03:04 you have to work at it to make it sync 2008-09-17 03:04 right, that's what I meant 2008-09-17 03:05 once I get this working without crashing, and understand the code better, I'm going to need to create a content-less file system 2008-09-17 03:05 ie. ability to create/chmod/chown/etc files in pure ram without having content (all zero length) 2008-09-17 03:05 that will of course be a lot... 2008-09-17 03:05 and no backing to disk 2008-09-17 03:06 so basically reimplement ramfs 2008-09-17 03:06 just take the tux3 checkin 2008-09-17 03:06 little point in doing otherwise 2008-09-17 03:06 was just planning on building on this 2008-09-17 03:06 you don't really want to reverse engineer the twisty thoughts of the vfs maintainer ;) 2008-09-17 03:06 sure I do 2008-09-17 03:07 I'll ask you again in two weeks ;) 2008-09-17 03:07 you can't write a good fs without understanding the twistyness of the layer above it 2008-09-17 03:07 ah 2008-09-17 03:07 and the layer beneath it 2008-09-17 03:07 but you can understand the twists without deriving them from first principle 2008-09-17 03:07 learning by trying is painful, but very efficient in the long term 2008-09-17 03:07 just saying, examples are what you want now 2008-09-17 03:08 you remember the errors you make along the way 2008-09-17 03:08 not really looking at the bits and imagining how they fit together 2008-09-17 03:08 well, I feel I need a real deep understanding of the vfs layer to even attempt to try what I would like to do 2008-09-17 03:08 you can do that to a certain extent 2008-09-17 03:09 but there is a high percentage of "arbitrary" inthe bit you're just about to go exploring 2008-09-17 03:09 not really, it's just the next on the list ;-) 2008-09-17 03:09 kill_litter_super ;) 2008-09-17 03:09 throw in some more block layer, and some networking, and a lot of memory management/userspace/mmap 2008-09-17 03:09 work till the end of the year if I'm lucky 2008-09-17 03:10 and have the time 2008-09-17 03:10 yeah, seen kill_little_super, although haven't read it with understanding yet 2008-09-17 03:10 litter 2008-09-17 03:11 you meant litter? 2008-09-17 03:12 you did... 2008-09-17 03:15 that does seem to be the one triggered by ramfs sb cleanup 2008-09-17 03:15 anyway, enough for tonight 2008-09-17 03:15 need to sleep 2008-09-17 03:17 me too 2008-09-17 03:17 think I found the bug 2008-09-17 03:18 kfree(sb) instead of kfree(buf) 2008-09-17 03:18 basically typo 2008-09-17 03:18 thinko 2008-09-17 03:37 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has joined #tux3 2008-09-17 03:47 -!- konrad(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-09-17 04:42 -!- kmeyer(~konrad@c-24-16-74-109.hsd1.wa.comcast.net) has joined #tux3 2008-09-17 08:00 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-17 08:09 -!- kbingham(~kbingham@92.9.151.25) has joined #tux3 2008-09-17 08:14 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-17 08:53 http://www.sciencedaily.com/releases/2008/09/080915105733.htm 2008-09-17 08:53 cool 2008-09-17 09:29 -!- Kirantpatil(~kiran@122.167.207.73) has joined #tux3 2008-09-17 09:42 http://en.gogloom.com/OFTC/tux3/ 2008-09-17 09:42 cool 2008-09-17 09:42 nice ! 2008-09-17 09:53 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-17 10:58 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-17 11:01 -!- Kirantpatil(~kiran@122.167.183.230) has joined #tux3 2008-09-17 11:01 -!- Kirantpatil(~kiran@122.167.183.230) has left #tux3 2008-09-17 12:50 tux3 on linuxtoday: http://www.linuxtoday.com/ 2008-09-17 12:51 http://www.linuxtoday.com/developer/2008091702135NWKN 2008-09-17 12:52 the phoronix mention is nice too: http://phoronix.com/forums 2008-09-17 12:53 http://www.phoronix.com/scan.php?page=news_item&px=NjcyNQ 2008-09-17 13:42 sync bitmap 2008-09-17 13:42 filemap_blockio: write <0:0> 2008-09-17 13:42 filemap_blockio: egad, wrote a clean buffer 2008-09-17 13:42 yay for driveby debug checks 2008-09-17 13:46 ah, it's because the dirty buffer is set clean before being written 2008-09-17 13:46 which is ok 2008-09-17 13:46 but the buffer emulation probably should have a writeback state 2008-09-17 13:46 like kernel 2008-09-17 13:46 or maybe that is wanking 2008-09-17 13:47 just turn off the debug check for now I guess 2008-09-17 13:47 maybe turn it into a check for writing out a non-uptodate buffer 2008-09-17 13:48 if (buffer_empty(buffer)) 2008-09-17 13:48 warn("egad, wrote an invalid buffer"); 2008-09-17 14:31 struct buffer *nextbuf = findblk(buffer->map, ends[down] + (int[2]){ 1, -1 }[down] ); 2008-09-17 14:31 hurt anybody's eyes? 2008-09-17 14:33 struct buffer *nextbuf = findblk(buffer->map, ends[down] + (down ? -1 : 1)); <- little less barbaric 2008-09-17 14:33 I don't know 2008-09-17 14:33 (int[2]){1,-1}[x] is pretty cute 2008-09-17 14:33 yup 2008-09-17 14:33 these days it's the slower of the two 2008-09-17 14:33 kinda shocking for us oldtimers 2008-09-17 14:34 1-2*!!down 2008-09-17 14:34 second will compile to a cmov 2008-09-17 14:34 or hmm 2008-09-17 14:34 you'd better hope the compiler optimizes that ;) 2008-09-17 14:34 it does 2008-09-17 14:34 but 2008-09-17 14:34 the condex is optimal 2008-09-17 14:34 for any proc with cmov or equivalent, which is pretty much all these days 2008-09-17 14:35 yeah 2008-09-17 14:35 cmov is a great 'invention' 2008-09-17 14:35 amazing it took so freakin' long 2008-09-17 14:35 I'd like to write (down ? + : -)1 2008-09-17 14:35 why can't I? 2008-09-17 14:35 lol 2008-09-17 14:35 that's wicked 2008-09-17 14:36 (down ? (+) :( -))1 2008-09-17 14:36 make it unambiguous 2008-09-17 14:36 (int)(ror 1,!down) 2008-09-17 14:37 not quite 2008-09-17 14:37 you need to propagate the sign all the way 2008-09-17 14:37 [by this point I'm not sure if we want -1 +1 or +1 -1 2008-09-17 14:37 oh right 2008-09-17 14:38 tricky 2008-09-17 14:38 cmov will win 2008-09-17 14:39 anyway, enough wan^review, got to push this extent maker a little further 2008-09-17 14:39 or down, down; adc ax,ax; leal (ax,ax,-1),ax 2008-09-17 14:39 nah 2008-09-17 14:39 the or sets z not c 2008-09-17 14:40 cmov will clean its clock 2008-09-17 14:40 true 2008-09-17 14:40 even when it's working 2008-09-17 14:40 although 2008-09-17 14:41 (down ? ends[down] - 1 : ends[0] + 1) 2008-09-17 14:41 will probably be better 2008-09-17 14:42 it'll probably end up as a compute in parallel and cmov to select 2008-09-17 14:42 this is a pure idiocy though 2008-09-17 14:42 it doesn't matter 2008-09-17 14:43 right, but nice 2008-09-17 14:43 you're a born demo coder 2008-09-17 14:51 if (ends[1] - ends[0]) 2008-09-17 14:51 printf("extent from %x to %x\n", ends[0], ends[1]); 2008-09-17 14:51 works, seems to 2008-09-17 14:51 time to check in 2008-09-17 14:53 ends[up] = next; <- reads kind of cutely 2008-09-17 14:55 for (int up = 0, sign = -1; up < 2; up++, sign = -sign) { 2008-09-17 14:55 the most efficient of all 2008-09-17 14:55 so far 2008-09-17 15:09 there we go, a checkin 2008-09-17 15:09 that gives me the moral right to go for a skate 2008-09-17 15:09 early skate today 2008-09-17 15:09 in honor of the cabal meeting 2008-09-17 15:20 mmm, sushi for breakfast 2008-09-17 15:35 -!- kbingham(~kbingham@92.9.135.11) has joined #tux3 2008-09-17 16:12 http://www.phoronix.com/forums/showthread.php?t=12704 2008-09-17 16:13 "either we will still be using ext5-6-7 in the future but with new ideas that were proven to be valuable by other projects like Tux3 or we might actually see a shift towards a completely new filesystem like tux3" 2008-09-17 16:14 "Lot's of information here though: http://shapor.com/tux3/shapor-tux3/doc/design.html" 2008-09-17 16:14 hehe 2008-09-17 16:15 there's a ringer in the thread 2008-09-17 17:02 folks 2008-09-17 17:10 reading fanboy mail ? :) 2008-09-17 21:24 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-17 21:56 -!- tim_dimm_(~mobile@32.174.56.165) has joined #tux3 2008-09-17 21:57 Greetings from the cabal 2008-09-17 21:59 Flips proposed a mellowing, Shapor wants swear words 2008-09-17 22:00 New sys call: 2008-09-17 22:00 un_fuck 2008-09-17 22:02 Cabal suggest sys_unfuck 2008-09-18 02:48 -!- kbingham(~kbingham@92.10.191.55) has joined #tux3 2008-09-18 03:35 folks 2008-09-18 03:35 not much irc traffic today 2008-09-18 03:35 how's it going ? 2008-09-18 03:44 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-18 04:03 today was pretty busy 2008-09-18 04:03 off channel action irl 2008-09-18 04:04 a sys_unfuck syscall was proposed, and useful work was also done 2008-09-18 04:05 irl ? 2008-09-18 04:05 in real life 2008-09-18 04:05 ok 2008-09-18 04:05 good, cabal meeting of sorts ? 2008-09-18 04:07 full blown 2008-09-18 04:07 oh really ? unannounced ? 2008-09-18 04:07 true 2008-09-18 04:07 who was there ? 2008-09-18 04:07 flips: are you getting private /msg ? 2008-09-18 04:07 can't say it was a cabal meeting 2008-09-18 04:07 ok 2008-09-18 04:09 regarding extents ? 2008-09-18 04:10 one thing indeed 2008-09-18 04:10 coding right now 2008-09-18 04:10 tricky 2008-09-18 04:10 yeah 2008-09-18 04:16 ok night 2008-09-18 04:17 surprised you're up this late still 2008-09-18 04:18 me too 2008-09-18 07:16 -!- Kirantpatil(~kiran@122.167.223.69) has joined #tux3 2008-09-18 07:16 -!- Kirantpatil(~kiran@122.167.223.69) has left #tux3 2008-09-18 07:57 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-18 08:36 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has joined #tux3 2008-09-18 08:57 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has joined #tux3 2008-09-18 09:21 -!- kbingham(~kbingham@92.20.194.187) has joined #tux3 2008-09-18 10:15 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:20 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:24 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 10:42 -!- kbingham(~kbingham@92.20.194.187) has joined #tux3 2008-09-18 10:47 -!- konrad(~konrad@D-128-208-53-196.dhcp4.washington.edu) has joined #tux3 2008-09-18 11:00 top 2008-09-18 11:57 -!- pgquiles(~pgquiles@50.Red-79-153-248.staticIP.rima-tde.net) has joined #tux3 2008-09-18 13:53 flips: btrfs claims to eventually have online disk checking 2008-09-18 13:54 a coworker just attended a btrfs talk 2008-09-18 16:17 dwalk_next is hard to write 2008-09-18 16:17 given some context already set up, returns the next extent from a dleaf 2008-09-18 16:18 probably will turn into a post to the list 2008-09-18 16:18 big complexity in a small corner 2008-09-18 16:18 as expected, actually 2008-09-18 16:56 hey 2008-09-18 17:02 pong 2008-09-18 17:02 how's it going ? 2008-09-18 19:07 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-18 19:07 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tue and Thur 8 p.m. Pacific Time ~ Next session: bio level data transfer" 2008-09-18 19:08 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: bio level data transfer" 2008-09-18 19:08 maze, ping 2008-09-18 19:19 -!- flips changed topic to "Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: bio level data transfer ~ Seinfeld ads canned, thanks for small mercies" 2008-09-18 19:19 -!- flips changed mode/#tux3 -> -o flips 2008-09-18 19:27 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-18 19:31 I figure if I make myself a new cuppa dark french right now have a fighting chance of getting streaming dleaf read working by midnight 2008-09-18 19:31 maybe even write 2008-09-18 19:31 ACTION takes action on that item 2008-09-18 19:34 ACTION is browsing LDD a little... 2008-09-18 19:48 -!- BSD(~bandan@pool-71-174-177-86.bstnma.east.verizon.net) has joined #tux3 2008-09-18 19:52 -!- Kirantpatil(~kiran@122.167.219.189) has joined #tux3 2008-09-18 19:53 -!- Kirantpatil(~kiran@122.167.219.189) has left #tux3 2008-09-18 19:53 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-18 19:55 Um.. How do I clone the git ddtree ? 2008-09-18 19:55 tried git clone? 2008-09-18 19:55 on the url I posted? 2008-09-18 19:55 Ya I mean what's the URL ? Sorry I probably missed it :( 2008-09-18 19:56 in a message somewhere, "tux3 report: what's next" 2008-09-18 19:56 alternatively, go to phunq.net/ddtree 2008-09-18 19:57 has gitweb and everything 2008-09-18 19:58 git clone http://phunq.net/tux3fs is what I tried 2008-09-18 19:58 it would be nice it git just worked 2008-09-18 19:59 like mercurial 2008-09-18 19:59 kay 2008-09-18 19:59 hmm.. 2008-09-18 19:59 a matter of getting the url right 2008-09-18 19:59 I think it gets confused by symlinks 2008-09-18 20:00 Yay I will just do it with hg, never mind :) 2008-09-18 20:00 git is just the kernel part 2008-09-18 20:00 you don't need that right now 2008-09-18 20:01 so mercurial 2008-09-18 20:01 nice nick 2008-09-18 20:01 :) 2008-09-18 20:03 I'll clean up the git cloneability later 2008-09-18 20:03 Thanks! 2008-09-18 20:03 manshack underwent a major re-arrange 2008-09-18 20:03 just another point on the "merucial rules" curve I think 2008-09-18 20:03 yummy 2008-09-18 20:03 wow 2008-09-18 20:03 we started 3 minutes ago 2008-09-18 20:04 no maze 2008-09-18 20:04 so we will take a slight change in session plan 2008-09-18 20:04 instead of doing bio transfers we will continue drilling down into generic_write 2008-09-18 20:05 ok, somebody summarize where we got to, please... mention _2copy 2008-09-18 20:06 ACTION looks at RazvanM 2008-09-18 20:06 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2063 2008-09-18 20:06 and the summary? 2008-09-18 20:07 and we got there from here: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2319 2008-09-18 20:07 the 2copy is used when there is no support for write_begin 2008-09-18 20:08 what is happening in this function? 2008-09-18 20:08 and we use prepare_Write and commit_write 2008-09-18 20:09 the data is moved to some kernel pages and then to some user memory? :P 2008-09-18 20:09 hi all 2008-09-18 20:09 hi 2008-09-18 20:09 ACTION takes a seat at the back of the room 2008-09-18 20:09 the data is moved from user memory onto buffer pages 2008-09-18 20:10 then the buffer pages are committed to disk 2008-09-18 20:10 sorry... I got the order wrong :P 2008-09-18 20:10 2copy is the lamest name anybody could have possibly chosen :p 2008-09-18 20:10 appears to be the real thing though 2008-09-18 20:11 just where we should be reading 2008-09-18 20:11 __grab_cache_page is the heart of it 2008-09-18 20:11 other things are decoration 2008-09-18 20:11 such as fault_in_readable 2008-09-18 20:12 just a quick q: why some functions start with uppercase? 2008-09-18 20:12 attempts to deal with the many dangerous recursions 2008-09-18 20:12 with varying degrees of success in terms of robustness and readability 2008-09-18 20:12 razvanm, random hackers 2008-09-18 20:12 what is write_begin? 2008-09-18 20:12 sometimes have studly caps days 2008-09-18 20:12 hey 2008-09-18 20:13 write_begin is a hook for some specialized user I don't know about 2008-09-18 20:13 "completely general interface used inexactly one place" like as not 2008-09-18 20:13 or "homework for shapor" 2008-09-18 20:13 hey maze 2008-09-18 20:13 :) 2008-09-18 20:13 ok 2008-09-18 20:13 ok, we can return to the original session plan 2008-09-18 20:14 maze, the plan is for you to report your findings on basic bio transfers 2008-09-18 20:14 lol 2008-09-18 20:14 point to code (you might want to pastie it) 2008-09-18 20:14 uhm, lol 2008-09-18 20:14 how about I put a tar.gz up? 2008-09-18 20:14 don't copy in the channel unless it's 1/2 lines 2008-09-18 20:14 that too 2008-09-18 20:14 pastie is good, use your taste 2008-09-18 20:15 if you had it checked in you could point a urls 2008-09-18 20:15 so... remember to check in next time ;) 2008-09-18 20:15 uploading 2008-09-18 20:15 since you code is so short I'd suggest just pasting the whole thing 2008-09-18 20:16 http://m.a.z.e.pl/junkfs.tar.gz 2008-09-18 20:16 lol nice domain! 2008-09-18 20:16 really 2008-09-18 20:16 leet 2008-09-18 20:16 yeah, I own z.e.pl 2008-09-18 20:17 almost as cool as cr.yp.to 2008-09-18 20:17 so I also have m.a@z.e.pl 2008-09-18 20:17 heh 2008-09-18 20:17 "opened with ark" 2008-09-18 20:17 or m@z.e.pl - whichever you prefer 2008-09-18 20:17 ok, who has got the code open, and who not? 2008-09-18 20:17 me not 2008-09-18 20:18 ok, got it open 2008-09-18 20:18 ark works pretty fscking well 2008-09-18 20:18 I'm impressed 2008-09-18 20:18 mind you - this is very rough, and mostly was debugging plus getting it working 2008-09-18 20:18 I'm still not quite sure of everything, and although I fixed the last hang bug I found 2008-09-18 20:18 I haven't since tested 2008-09-18 20:18 so I'm not sure ;-) 2008-09-18 20:18 don't worry, shapor will hurt you if you get anything wrong 2008-09-18 20:19 lol 2008-09-18 20:19 ACTION wields axe 2008-09-18 20:19 so... where does the bio read setup start? 2008-09-18 20:20 do you want me answering? 2008-09-18 20:20 yes 2008-09-18 20:20 you should have been asking ;) 2008-09-18 20:20 hmm. 2008-09-18 20:20 right 2008-09-18 20:20 so pretty much everything except super.c is either makefile or debug 2008-09-18 20:20 noticed 2008-09-18 20:21 and the bottom of super.c is pretty standard module init stuff 2008-09-18 20:21 nicely lindented 2008-09-18 20:21 for the moment we only care about the bio transfer 2008-09-18 20:21 and above that is the standard fs registering and fs_ops stuff 2008-09-18 20:22 and from there we get to junkfs_get_sb which calls into get_sb_bdev 2008-09-18 20:22 which calls junkfs_fill_super as a callback 2008-09-18 20:22 and that's were all the action is 2008-09-18 20:22 action :) 2008-09-18 20:22 get_sb_bdev also exclusively opens the block device for us, so that's nice 2008-09-18 20:22 finally, after 4 days of tux3 U 2008-09-18 20:22 at the point we enter into junkfs_fill_super, we have an exclusively opened block device 2008-09-18 20:23 which is passed in the superblock 2008-09-18 20:23 sb->s_bdev 2008-09-18 20:23 in junkfs_fill_super we then proceed to allocate memory for 3 basic objects 2008-09-18 20:23 1) memory to read in the 512 byte (SB_SIZE) superblock 2008-09-18 20:23 1 sector sb, leet 2008-09-18 20:23 2) an object to store state (in the bio->b_private field) 2008-09-18 20:24 c) a bio 2008-09-18 20:24 1 and 2 are just normal kmalloc's 2008-09-18 20:24 3 is via bio_alloc 2008-09-18 20:24 thus 1 and 2 will need to be kfree'd 2008-09-18 20:24 -!- Bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-09-18 20:24 and 3 will need to be bio_put'ed at some point before the end of junkfs_fill_super 2008-09-18 20:24 or we'll leak 2008-09-18 20:25 anyway, standard handling of error returns on all the allocs 2008-09-18 20:25 and we get to: 2008-09-18 20:25 bio->bi_bdev = sb->s_bdev; 2008-09-18 20:25 <------>bio->bi_sector = 0; // first sector 2008-09-18 20:25 <------>s = bio_add_page(bio, virt_to_page(buf), SB_SIZE, offset_in_page(buf)); 2008-09-18 20:25 which is most of the bio preparation stage 2008-09-18 20:25 Bushman: hi Marcin 2008-09-18 20:25 the real meat 2008-09-18 20:25 we set the bio to refer to the correct block device 2008-09-18 20:25 marcin, hi 2008-09-18 20:25 and (for now - this is all junkfs ;-) ) we just read the first sector 2008-09-18 20:26 sectors in new linux are always exactly 512 bytes 2008-09-18 20:26 that's leet nuff for us 2008-09-18 20:26 so we're saying here offset 0 * 512 into the block dev 2008-09-18 20:26 then we need to tell the bio where to store the data 2008-09-18 20:26 (or read from, since a write would be identical) 2008-09-18 20:26 right, struct bio is sector-addressed for no good reason 2008-09-18 20:26 s = bio_add_page(bio, virt_to_page(buf), SB_SIZE, offset_in_page(buf)) 2008-09-18 20:26 hello Daniel 2008-09-18 20:27 this actually gives our carefully allocated memory to the bio as memory 2008-09-18 20:27 bushman, enjoy ;) 2008-09-18 20:27 note that bio_add_page takes (bio, struct page*, len, ofs) 2008-09-18 20:27 i dunno if enjoy is the right word for kernel code just before bedtime ;) 2008-09-18 20:27 so we pass in the bio, then convert the bufs address to a page via virt_to_page 2008-09-18 20:27 and you could write it out in full in about as much code as the function call takes 2008-09-18 20:27 pass the length of the block 2008-09-18 20:28 and calc the offset from the page struct for the ofs via offset_in_page 2008-09-18 20:28 bushman, then just enjoy the geek banter 2008-09-18 20:28 virt_to_page? 2008-09-18 20:28 I'm assuming at this point that a kmalloc can't give us memory split across pages 2008-09-18 20:28 - not sure if this is correct 2008-09-18 20:28 shapor, great question 2008-09-18 20:28 maze, correct 2008-09-18 20:28 so buf was kmalloc'ed, so it's a virtual kernel memory address 2008-09-18 20:29 maze, unless the kmalloc is bigger than a page 2008-09-18 20:29 virt_to_page gives us the struct page * for the kaddr we pass to it 2008-09-18 20:29 [flips: of course] 2008-09-18 20:29 maze, and why do we need the struct page? 2008-09-18 20:29 because that's what bios want 2008-09-18 20:29 if you look at what a bio is 2008-09-18 20:29 it's 3 things 2008-09-18 20:30 the struct bio 2008-09-18 20:30 which has a lot of management fields 2008-09-18 20:30 the bvec which 2008-09-18 20:30 is an array of a tiny struct with 3 fields 2008-09-18 20:30 { struct page * p; int len; int ofs; } 2008-09-18 20:31 so basically a list of where to put the next len bytes, specifying memory via page/ofs pairs 2008-09-18 20:31 this is for two reasons: 2008-09-18 20:31 [at least as far as i can tell] 2008-09-18 20:31 a) most hw (ie. stuff the blockdevice drivers care about) 2008-09-18 20:31 cares about physicall addresses and not virtual kernel addresses 2008-09-18 20:31 right 2008-09-18 20:31 ie. for dma and all that good for performance goodness 2008-09-18 20:32 b) this can also be used for data xfr into userspace 2008-09-18 20:32 and there is no guarantee userspace memory has a mapping into kernel space 2008-09-18 20:32 [high mem] 2008-09-18 20:32 the big reason: scatter gather 2008-09-18 20:32 this is a dma interface in disguise 2008-09-18 20:32 very effective one 2008-09-18 20:32 this also makes it easier to coallesce physically neighboring memory together into the bvecs 2008-09-18 20:32 precisely 2008-09-18 20:33 right, another way of saying scatter gather 2008-09-18 20:33 notice that in bio_alloc 2008-09-18 20:33 we passed in a 1 2008-09-18 20:33 that 1 is the number of bvecs in the bvec area allocated to the bio 2008-09-18 20:33 so that limits how many non-contig pieces of memory we can have in the bio 2008-09-18 20:33 ah 2008-09-18 20:33 here - all we need is 1 2008-09-18 20:33 and because you did that, you could have initialized your one bvec with a simple structure assignment 2008-09-18 20:33 instead of the function call 2008-09-18 20:33 right. 2008-09-18 20:33 which does a bunch of stuff you don't need 2008-09-18 20:34 oh well. 2008-09-18 20:34 does a bio_vec describes exactly one page? 2008-09-18 20:34 maze, exactly 2008-09-18 20:34 no 2008-09-18 20:34 bv_len 2008-09-18 20:34 it describes a start page with ofset and a length 2008-09-18 20:34 the length may exceed that page and cross into however many next ones 2008-09-18 20:34 the precise rules for merging are overridable 2008-09-18 20:34 it describes a data region that resides within one page 2008-09-18 20:35 so the bio interface will be quite good for extents 2008-09-18 20:35 many device drivers have limits on how many sectors they can transfer in one go (ie. 200 or so) 2008-09-18 20:35 maze, you can't cross a page with a bvec 2008-09-18 20:35 flips, you sure? 2008-09-18 20:35 sadly, or perhaps sanely 2008-09-18 20:35 I certainly ain't ;-) 2008-09-18 20:36 pretty sure 2008-09-18 20:36 but then I don't know what I'm talking about here 2008-09-18 20:36 never seen it done ;) 2008-09-18 20:36 these are still all guesses 2008-09-18 20:36 pollacks ain't sane, just ask Shap 2008-09-18 20:36 I thought they merged by themselves 2008-09-18 20:36 hmm, well, first homework I;d guess 2008-09-18 20:36 one more q: bv_len is counting bytes or sectors? :P 2008-09-18 20:37 merging happens in the physical driver 2008-09-18 20:37 good question 2008-09-18 20:37 anyway bio_add_page returns how much it successfully added (or what the current total is, not sure) in bytes 2008-09-18 20:37 bytes I think 2008-09-18 20:37 so if everything is good it should be 512 at this point 2008-09-18 20:37 hence the check 2008-09-18 20:37 it's pretty badly braindamaged i that respect, counting in different units for no good reason 2008-09-18 20:37 if it doesn't match, we've got a problem - which mind you - AFAICT - can't happen 2008-09-18 20:37 and we bio_put to free the structure and basically error out 2008-09-18 20:38 [of course here we always error out, because this is junkfs (tm)] 2008-09-18 20:38 anyway if s==512 then we're good 2008-09-18 20:38 oh bv_len is definitely bytes 2008-09-18 20:38 we setup to more fields in the bio 2008-09-18 20:38 bi_end_io is the call back for when the bio is processed (or errors out) 2008-09-18 20:39 when the disk completion interrupt fires 2008-09-18 20:39 key point 2008-09-18 20:39 bi_private is a pointer to our data (the mz struct) so that we can figure out what we're talking about in the endio handler 2008-09-18 20:39 and then we submit the bio for READ 2008-09-18 20:39 now this (ie. bios) are inherently asynchronous 2008-09-18 20:40 so at this point it might have already completed - it could have been cached and come back immediately 2008-09-18 20:40 right... it's the _only_ way to recover a memory context for a completed bio 2008-09-18 20:40 [I think] 2008-09-18 20:40 or we might need to wait some indeterminate amount of time 2008-09-18 20:40 it's much more direct than that 2008-09-18 20:40 here's where we make use of the waitqueue which we helpfully placed in the mz struct 2008-09-18 20:40 disk raises interrupt -> endio gets called 2008-09-18 20:40 in interrupt context 2008-09-18 20:40 this is as on the metal as you will get without going hypervisor 2008-09-18 20:41 oh, so basically end_io should do as little as feasibly possible 2008-09-18 20:41 preferably as simple as it is here 2008-09-18 20:41 yes 2008-09-18 20:41 again yes 2008-09-18 20:41 is it the right place to call bio_put ? 2008-09-18 20:41 though I often get excessive there ;) 2008-09-18 20:41 anyway, earlier on, we'd already initialized the waitqueue, so now we can just wait on it 2008-09-18 20:41 in the endio handler? 2008-09-18 20:42 except wait needs not only a waitqueue (wq) but also a condition 2008-09-18 20:42 [which it checks _first_] 2008-09-18 20:42 maze, _interruptible? 2008-09-18 20:42 hence mz struct also contains a boolean 2008-09-18 20:42 flips: yeah, no idea what the right choice is there, meaning to ask about this 2008-09-18 20:42 shapor, yes 2008-09-18 20:42 very important question 2008-09-18 20:42 flips, so how would it behave in a hypervisor? any changes? does it lose determinism? 2008-09-18 20:42 why does it matter? 2008-09-18 20:43 if interruptible, you better be prepared to field anything that can be thrown at you 2008-09-18 20:43 if uninterruptible, you'd better be able to prove it always completes 2008-09-18 20:43 is that the basis for atomicity then? 2008-09-18 20:43 so what could get thrown at us, and will the bio always complete? 2008-09-18 20:43 flips: what happens if there is an error 2008-09-18 20:43 bushman, we don't touch hypervisors 2008-09-18 20:43 disk io error or something 2008-09-18 20:43 if we did, it would be to implement hard realtime or something 2008-09-18 20:43 hypervisors should be transparent to the os 2008-09-18 20:43 does the endio handler get called? 2008-09-18 20:44 yes endio has err parameter 2008-09-18 20:44 bushman, there is some sense of atomicity here in the interruptible/noninterrupble distinction 2008-09-18 20:44 loose sense 2008-09-18 20:44 just to finish off this (junkfs_fill_super) function, we then dump the superblock via printk and free everything and return an error (junkfs remember.?) 2008-09-18 20:44 maze, in kernel interrupts don't just happen, you have to ask for them 2008-09-18 20:45 even with preemption 2008-09-18 20:45 ? 2008-09-18 20:45 or they get fielded on syscall exit 2008-09-18 20:45 SHOULD be transparrent, but since most of them mangle time into nonlinear, doesnt it screw up our predictions when interrupt is gonna finish? 2008-09-18 20:45 task switch is not interrupt 2008-09-18 20:45 it's caused by an interrupt 2008-09-18 20:45 oh i see you just aren't checking the err parameter in end_io_read 2008-09-18 20:45 you can get a task switch even with wait_uninterruptible 2008-09-18 20:45 probably should ;) 2008-09-18 20:45 so while in kernel space, my thread of execution is guaranteed not get interrupted by anything? 2008-09-18 20:46 right I should ;-) 2008-09-18 20:46 all that means is, an interrupt won't cause the wait to bail early 2008-09-18 20:46 you have to wrap your interruptible wait in a loop 2008-09-18 20:46 or write uninterruptible 2008-09-18 20:46 so interruptible here refers to what? can be interrupted by killing the mount process? 2008-09-18 20:46 which is probably what you want here 2008-09-18 20:46 just means the wait may bail before the wak 2008-09-18 20:46 wake 2008-09-18 20:47 so has to be in a loop, and you can't assume that what you were waiting for actually happened 2008-09-18 20:47 so i guess the big question here is how do we guarantee that the write is gonna complete? 2008-09-18 20:47 so I'd want uninterruptible? or interruptible and then on some interrupts somehow cancel and free the bio 2008-09-18 20:47 just write uninterruptible until you know kernel scheduling better ;) 2008-09-18 20:47 (read here) 2008-09-18 20:47 uninterruptable will cause it to be D too iirc 2008-09-18 20:47 bushman, it always completes 2008-09-18 20:47 D state 2008-09-18 20:48 with or without an error 2008-09-18 20:48 Bushman: it may complete with an error 2008-09-18 20:48 which gets passed to the endio handler 2008-09-18 20:48 yes, this is d state, the real thing 2008-09-18 20:48 which as written ignores all errors, and just marks the io as completed, frees the bio, and wakes the wq 2008-09-18 20:48 interruptable is not quite so severe i guess 2008-09-18 20:48 you are in d state any time you're waiting in kernel 2008-09-18 20:48 even interruptable? 2008-09-18 20:48 yes 2008-09-18 20:48 unless you're doing wait_interruptible? 2008-09-18 20:49 hmm 2008-09-18 20:49 flips: didn't we find that not to be the case 2008-09-18 20:49 with ddsnap 2008-09-18 20:49 even then I think 2008-09-18 20:49 hmm, so how could I get this to be abortable, in case for example the block device hangs on network? 2008-09-18 20:49 remember our threads were all D state 2008-09-18 20:49 you get a qualifier on your ps output 2008-09-18 20:49 until we changed it to interruptable 2008-09-18 20:50 maze, that's not your job, it's the job of the device insert/remove 2008-09-18 20:50 which of course means it's badly mismanaged ;) 2008-09-18 20:50 but... 2008-09-18 20:50 not your problem for now 2008-09-18 20:50 well what if we're running this off of a nbd or something like that, and the network gets pulled 2008-09-18 20:50 would the bio then just (eventually) return with an error to endio? 2008-09-18 20:50 that's nbd's problem 2008-09-18 20:50 again not yours 2008-09-18 20:51 you can try to do timeouts and things, but you're risking redudancy 2008-09-18 20:51 and confusion 2008-09-18 20:51 right 2008-09-18 20:51 risking redundancy ? 2008-09-18 20:51 duplicating functionality that is better performed at some other layer 2008-09-18 20:52 constant risk with the blind leading the blind ;) 2008-09-18 20:52 yeah 2008-09-18 20:52 good point 2008-09-18 20:52 but the blind leading the deaf is ok 2008-09-18 20:52 maze, that was a great walkthrough, and the code is great too 2008-09-18 20:52 yes! 2008-09-18 20:52 not perfect, but you don't need that to be great in linux ;) 2008-09-18 20:52 I stil don't quite understand a bunch of it 2008-09-18 20:52 MaZe: thanks, i was following closely with little time to type 2008-09-18 20:53 a few warts make it more real, like a european movie 2008-09-18 20:53 hah 2008-09-18 20:53 ACTION rolls eyeballs 2008-09-18 20:53 lol 2008-09-18 20:53 maze, I am going to cut and paste your code into fs/tux3/super.c 2008-09-18 20:53 and tux3 is going to read a leet sector sized sb too 2008-09-18 20:54 heh 2008-09-18 20:54 s/junkfs/tux3/ 2008-09-18 20:54 hehe 2008-09-18 20:54 exactly 2008-09-18 20:54 or s/tux3/junkfs/ 2008-09-18 20:54 depending on leetness or lack of it 2008-09-18 20:54 so it seems silly for every fs to have to do this 2008-09-18 20:54 is the vfs totally useless? 2008-09-18 20:54 yes 2008-09-18 20:55 pretty much 2008-09-18 20:55 what I still haven't found is how to specify the io priority of the bio you submit 2008-09-18 20:55 pretty close 2008-09-18 20:55 not completely 2008-09-18 20:55 lame but not useless 2008-09-18 20:55 better than NT 2008-09-18 20:55 I'm assuming it inherits from the ionice'ness of the process in whose context you're running 2008-09-18 20:55 maze, completely separate 2008-09-18 20:55 it's part of the elevator abstraction 2008-09-18 20:55 oh? 2008-09-18 20:56 huh? 2008-09-18 20:56 i was wondering that too 2008-09-18 20:56 inheriting anything is completely a property of the elevator plugin 2008-09-18 20:56 shouldn't submitting a read/write request to a blockdevice be exactly when this matters? 2008-09-18 20:56 see "request queue" 2008-09-18 20:56 oh, the mysterious q parameter 2008-09-18 20:56 one of the harder code reading projects in kernel 2008-09-18 20:56 it's a mess 2008-09-18 20:56 I saw all over the place 2008-09-18 20:56 that is apparently a field in the bio struct 2008-09-18 20:57 q is a carpet under which all kinds of doggie poo is swept 2008-09-18 20:57 it's really a bag tied onto the side of the bio 2008-09-18 20:57 we'll get rid of it before next christmas 2008-09-18 20:57 I hope 2008-09-18 20:57 I just want a nice aio read/write with priority interface for my coding 2008-09-18 20:57 you got it 2008-09-18 20:57 already 2008-09-18 20:58 well s/nice/nicer than what we had before/ 2008-09-18 20:58 that would be a good project.. a new aio interface 2008-09-18 20:58 right, I have the aio rw 2008-09-18 20:58 sounds like it should map easily enough.... 2008-09-18 20:58 bio transfer is aio at its purest 2008-09-18 20:58 yeah 2008-09-18 20:58 right, but you want prioritization in there 2008-09-18 20:58 should be easier than non aio realy 2008-09-18 20:58 and that's what I'm failing to see 2008-09-18 20:58 maze, in the elevator 2008-09-18 20:58 'scuze my newbness, but wouldnt priority be at odds with queuing that the controllers try to do? 2008-09-18 20:58 so does the bio go through the elevator? 2008-09-18 20:59 bushman, interactions, yes 2008-09-18 20:59 not all good 2008-09-18 20:59 well, you want something htb like for io 2008-09-18 20:59 best to try and harmonize with them 2008-09-18 20:59 wait a minute, what's the layering here? 2008-09-18 21:00 is the physical hw under the elevator under the bio 2008-09-18 21:00 vfs <-> bio <-> driver 2008-09-18 21:00 and where's the elevator? 2008-09-18 21:00 between bio and driver 2008-09-18 21:00 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:00 right? 2008-09-18 21:00 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:00 ? 2008-09-18 21:00 heh 2008-09-18 21:00 heh 2008-09-18 21:00 exactly 2008-09-18 21:00 so by choosing the request queue in the bio, I choose priority of the request with regards to other requests? 2008-09-18 21:00 and the presence/lack of the elevator is up to the driver or virtual driver even 2008-09-18 21:01 so the elevator can appear at multiple or no places in the stack 2008-09-18 21:01 so the elevator messes with fields in the bios? 2008-09-18 21:01 is this screwy? or is this just me...? 2008-09-18 21:01 and vice versa in an idiotic way... sometimes useful way 2008-09-18 21:01 maze, it's screwy 2008-09-18 21:01 not just you 2008-09-18 21:01 but better than we had in 2.4 2008-09-18 21:02 it's damn fast actually, compared to a disk 2008-09-18 21:02 we didn't have that a few years ago 2008-09-18 21:02 now it's looking slow again 2008-09-18 21:02 and people are asking me to fix it 2008-09-18 21:02 it shall be done 2008-09-18 21:03 wait a minute - what is slow? 2008-09-18 21:03 the interfaces / kernel code? 2008-09-18 21:03 this who kooky chain 2008-09-18 21:03 whole 2008-09-18 21:03 vfs <-> bio <-> elevator <-> driver 2008-09-18 21:03 layering is right 2008-09-18 21:03 implementation is faulty 2008-09-18 21:03 agreed 2008-09-18 21:04 anyway 2008-09-18 21:04 we're using the existing one for now 2008-09-18 21:04 it will work for tux3 as well as it works for anybody 2008-09-18 21:04 better, because we will use it more directly 2008-09-18 21:04 and have fewer strange waits and so on 2008-09-18 21:04 right 2008-09-18 21:04 and when we do see a strange wait, we will be able to pounce on it 2008-09-18 21:04 that's why I wanted to go all the way down to the bio on the sb read 2008-09-18 21:05 a) for practice 2008-09-18 21:05 b) because it's the way it should be done 2008-09-18 21:05 unlike if you use the... odd... vfs block io helpers 2008-09-18 21:05 well I think we are going to stay all the way down here for tux3 2008-09-18 21:06 tux3 has no use asking other subsystems to submit bios on its behalf, unless that subsystem is an lvm 2008-09-18 21:06 and even then, we just submit a bio to the lvm without caring its not a real device 2008-09-18 21:06 still have to figure out how to do mmap like stuff (ie. trigger read in, on page fault, or write out, both for kernel and userspace, and cow, etc) 2008-09-18 21:06 maze, handled for you 2008-09-18 21:06 like magic 2008-09-18 21:06 cool - assuming it does the right thing (tm) 2008-09-18 21:06 see filemap.c -> nopage 2008-09-18 21:06 kinda right 2008-09-18 21:06 some messed locking 2008-09-18 21:07 which I'm not sure it does for cache coherency netfs 2008-09-18 21:07 bottlenecks on i_mutex during fault in 2008-09-18 21:07 bad 2008-09-18 21:07 so it probably needs to be gone through with a fine comb then 2008-09-18 21:07 even nfs is cache coherent/consistent with respect to mmap 2008-09-18 21:07 as I was expecting 2008-09-18 21:07 yes 2008-09-18 21:07 right in to the danger zone 2008-09-18 21:08 speaking of which 2008-09-18 21:08 what bottlenecks on i_mutex? 2008-09-18 21:08 time to turn on the ghetto blaster 2008-09-18 21:08 and get back to coding 2008-09-18 21:08 I'm assuming the code in filemap.c which deals with page-in/outs of mmapped pages 2008-09-18 21:08 oh, right it's already 10 past 9 2008-09-18 21:08 so is that it for this time? 2008-09-18 21:08 ACTION puts on Holst's the planets, performed by korean rock band 2008-09-18 21:09 ACTION scrolls back to remember his homework 2008-09-18 21:09 that's it, nice one maze 2008-09-18 21:09 is anybody sticking around to ask lame(er) questions? 2008-09-18 21:09 next time it will be razvanm's turn 2008-09-18 21:09 :P 2008-09-18 21:09 oh, awesome, what's he doing? 2008-09-18 21:09 to explain some more of _2copy 2008-09-18 21:09 ah 2008-09-18 21:09 lame question period is officially open 2008-09-18 21:10 intelligent questions banned 2008-09-18 21:10 what's an elevator? 2008-09-18 21:10 ACTION doesn't have anything to ask this time 2008-09-18 21:10 a kernel elevator 2008-09-18 21:10 when you read/write data to a hard disk 2008-09-18 21:10 otherwise you're going to get some dumb jokes 2008-09-18 21:10 which is a spinning platter with a seeking head 2008-09-18 21:10 elevator = io scheduler 2008-09-18 21:10 then depending on the order you send out request 2008-09-18 21:10 just caught up 2008-09-18 21:11 you may need to do a small or large number of seeks 2008-09-18 21:11 like tivo for geeks 2008-09-18 21:11 yup, and it's algorithms are the same as a busy elevator in a skyscraper 2008-09-18 21:11 seeks are very expensive 2008-09-18 21:11 so you try to minimize seeks 2008-09-18 21:11 for good performance (b/w), but higher latency 2008-09-18 21:11 so are tlb misses 2008-09-18 21:11 and page cache misses 2008-09-18 21:11 you basically scan the disk from top to bottom, doing read writes at increasing lba addresses 2008-09-18 21:11 irregardless of the order they were submitted in 2008-09-18 21:11 then do the same thing going downwards 2008-09-18 21:12 somewhat downwards 2008-09-18 21:12 ok great, but from this level, can we be aware of what media we're writing to so we dont make it overinvolved in cases it doesnt matter, like solid state disks? 2008-09-18 21:12 right 2008-09-18 21:12 the disk doesn't like going backwards as much as forwards 2008-09-18 21:12 the consecutive read/write sectors are still upwards 2008-09-18 21:12 Bushman: you can pick an io scheduler on a per-block-device basis 2008-09-18 21:12 and sometimes you skip the backwards step entirely 2008-09-18 21:12 depends 2008-09-18 21:12 bushman, mostly we don't care, where we do care we care a lot 2008-09-18 21:12 lots of fine tuning required to get optimal performance 2008-09-18 21:13 and it heavily depends on usecases 2008-09-18 21:13 /sys/block/sda/queue/scheduler 2008-09-18 21:13 as long as it's adjustable from userspace i'm good ;) 2008-09-18 21:13 plus you can throw in individual io priorities into the mix (ie. reading this sector is more important) 2008-09-18 21:13 we try to design for whole classes of usecases, rather than one at a time 2008-09-18 21:13 and b/w per job, and hard read/write deadlines, etc 2008-09-18 21:13 and it all gets complex 2008-09-18 21:13 http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/ 2008-09-18 21:13 shapor, nice, i havent gotten used to the new linux, i've been bsd'ing since '03 2008-09-18 21:13 i only recently discovered ionice 2008-09-18 21:13 and the elevator is the piece of code which gets requests thrown at it 2008-09-18 21:14 i think mentioned on here 2008-09-18 21:14 does some algo mumbo jumbo to put them in the 'best' order 2008-09-18 21:14 shapor, because it doesn't work that well? 2008-09-18 21:14 and throws them at the disk 2008-09-18 21:14 flips: yes but the interface is there 2008-09-18 21:14 if people use it they can report bugs 2008-09-18 21:14 sure 2008-09-18 21:14 if people dont report bugs or say it sucks on lkml it wont get fixed 2008-09-18 21:14 same problem with posix_fadvise 2008-09-18 21:14 note that for a network nic 2008-09-18 21:14 we will take it for a spin at some point 2008-09-18 21:15 you have a certain amount of b/w 2008-09-18 21:15 maze will ;) 2008-09-18 21:15 and it's all pretty easy - conceptually 2008-09-18 21:15 and shapor will make some nice charts of the event logs 2008-09-18 21:15 vfs + bio events 2008-09-18 21:15 oh i almost forgot about that 2008-09-18 21:15 sending each packet involves a fixed amount of headroom, (header fields), the packet itself, and a fixed footer 2008-09-18 21:15 still no clue how to glue those together 2008-09-18 21:15 so when you send a packet you know exactly how much of the nic (ie. for how long) you're using it up 2008-09-18 21:16 thus you can make very nice guarantees 2008-09-18 21:16 and this is what htb + sfq does for networking 2008-09-18 21:16 htb? sfq? 2008-09-18 21:16 you can partition your network card pretty much arbitrarily between diifferent apps 2008-09-18 21:16 giving different apps different priorities, then different priorities different amounts of bw 2008-09-18 21:16 and the priorities don't need to be strictly linear either 2008-09-18 21:16 htb? sfq? 2008-09-18 21:16 htb 2008-09-18 21:16 oh could i get in on the testing? i've done a lot of work visualizing sequences of events in temporal OSPF loops, this should be i could do ;) 2008-09-18 21:17 htb is basically a tree structure 2008-09-18 21:17 the nodes are were requests come in 2008-09-18 21:17 what's the tla mean? 2008-09-18 21:17 the root is were requests come out 2008-09-18 21:17 so each application (or tcp stream, or whatever you're using) gets assigned to a leaf node in this tree 2008-09-18 21:17 (Stochastic Fairness Queueing) 2008-09-18 21:18 and the network driver then (when it wants to send) always pulls from the root 2008-09-18 21:18 gah 2008-09-18 21:18 each node in this tree has a certain speed of accumulating tokens 2008-09-18 21:18 (htb = hierarchical token buckets) 2008-09-18 21:18 that it accumulates in the bucket in that node 2008-09-18 21:18 wouldnt stochastic approach that every client is equally unhappy? ;) 2008-09-18 21:19 Bushman: sfq is used in the leafs to randomly select between clients / tcp streams you consider equivalent 2008-09-18 21:19 you hang an sfq off of each leaf node in htb, so you actually throw the packets at the correct sfq, and the htb leaf pulls it from the attached sfq 2008-09-18 21:19 network peeps are always reinventing the world ;) 2008-09-18 21:20 ah, so you use the hiarchical token buckets to assign different classes of service to different apps/streams? 2008-09-18 21:20 anyway, you divide up each nodes bandwidth among it's children 2008-09-18 21:20 and then define how and when they can borrow/lend tokens to each other 2008-09-18 21:20 I'm not doing a very good job of defining it here 2008-09-18 21:20 but it's wicked! 2008-09-18 21:20 no- you're doing a great job 2008-09-18 21:20 maze, I'm getting the idea 2008-09-18 21:20 sounds wicked 2008-09-18 21:20 yea i just did a project with filtering/limiting at work, so i'm getting it 2008-09-18 21:21 it sounds a lot smarter than it is ;) 2008-09-18 21:21 well, disk layer doesn't have any such pretentions to sophistication 2008-09-18 21:21 yet 2008-09-18 21:21 heh 2008-09-18 21:21 damn academis justifying their existence 2008-09-18 21:21 anyway, basically htb + sfq is the best I've seen for networking, and would probably be awesome for other stuff as well like scheduling cpus 2008-09-18 21:21 I can imagine the mess if it did 2008-09-18 21:21 Bushman: gee filtering and limiting, i wouldn't have guessed :P 2008-09-18 21:21 except it's probably to compute intensive for that and can't take cache-heat or memory nearness into account 2008-09-18 21:21 shapor: stfu ;) 2008-09-18 21:22 :) 2008-09-18 21:22 anyway, with disk it gets tougher 2008-09-18 21:22 if it did, could be interesting as a cache coherency protocal 2008-09-18 21:22 because you can't just up and calculate how long a particular operation will take 2008-09-18 21:22 network peeps always trying to find the must obscrue TLA 2008-09-18 21:22 Bushman: don't you guys use bullets for limiting ? :P 2008-09-18 21:22 mot <- most obscure tla 2008-09-18 21:22 haha 2008-09-18 21:22 (with the nic, you know its line rate, you know how many bytes your sending, the size of the pre and post-amble, the wait between packets, you thus now the _entire_ cost of sending any given packet] 2008-09-18 21:23 dont make me whip out stories about invalidating keys with thermite granades 2008-09-18 21:23 motley cru 2008-09-18 21:23 tla? 2008-09-18 21:23 mot? 2008-09-18 21:23 maze, and you don't know much carrier sense backout is going to cost ;) 2008-09-18 21:23 most obscure three letter acronym 2008-09-18 21:23 ah, so you use the hiarchical token buckets to assign different classes of service to different apps/streams? - precisely 2008-09-18 21:23 and that's where your pretentions to realtime control come crashing down 2008-09-18 21:23 which is a fla 2008-09-18 21:24 which is a tla 2008-09-18 21:24 which is a tla 2008-09-18 21:24 third time lucky 2008-09-18 21:24 for example I would give each user in my network their own sfq for local traffic to another nic (just switching) to another network via wireless and to the internet (via the same wireless) 2008-09-18 21:24 to make delivery time guaranteed, woudlnt you have to have full preempt kernel? (oh i miss 80ties Amigas) 2008-09-18 21:24 ACTION thinks of some keys he'd like invalidated 2008-09-18 21:24 and then use htb to make sure everything was fair on the slow internet link, and on the others at the same time - worked awesome 2008-09-18 21:25 be right back in 10. 2008-09-18 21:25 was a good one 2008-09-18 21:25 so who's hungry? 2008-09-18 21:25 me? 2008-09-18 21:25 was just going to order from bruno's 2008-09-18 21:25 we could meet there instead 2008-09-18 21:25 you don't coult, you're always hungry 2008-09-18 21:25 flips: i thought you were coding not slacking tonight 2008-09-18 21:25 ;-) 2008-09-18 21:25 i need to sleep, it's past midnight here damn it 2008-09-18 21:26 shapor, what do think I was doing while maze was talking? 2008-09-18 21:26 Bushman: I'll drink a zyweic for you :) 2008-09-18 21:26 Bushman: east coast? 2008-09-18 21:26 bushman, laterz 2008-09-18 21:26 you guys keep it too interesting 2008-09-18 21:26 heh thanks 2008-09-18 21:26 ACTION also goes to bed. Good night to everyone. 2008-09-18 21:27 shapor- you up for grub? 2008-09-18 21:27 Shapor: you better have some Zywiec/Okocim handy when i invade LA again 2008-09-18 21:27 that would be "more grub" 2008-09-18 21:27 safe bet, shapor already cooked tonight 2008-09-18 21:27 tim_dimm_: yeah i already ate 2008-09-18 21:27 k, beer? 2008-09-18 21:27 flips: no, i had gDinner 2008-09-18 21:27 how about some chianti 2008-09-18 21:27 ? 2008-09-18 21:27 flips: did Shap introduce you to polish beer yet? 2008-09-18 21:27 heh 2008-09-18 21:27 don't need shap for that 2008-09-18 21:28 used to live in berlin 2008-09-18 21:28 ah yes, the spoils of war... ;) 2008-09-18 21:28 heh 2008-09-18 21:28 not sure which way that one cuts 2008-09-18 21:28 all the kings horses.... 2008-09-18 21:28 couldn't stop tanks 2008-09-18 21:28 but they stopped for a beer! 2008-09-18 21:29 berlin has lots of wayward poles 2008-09-18 21:29 drinking, mostly 2008-09-18 21:29 some leggy poles 2008-09-18 21:29 drinking 2008-09-18 21:29 flips: since its late night, how's swingers sound? 2008-09-18 21:29 even the ubermensch need a brewsky 2008-09-18 21:29 or playing with the berlin boys 2008-09-18 21:29 toying actually 2008-09-18 21:29 berlin boy toys 2008-09-18 21:29 also fun for the finhish girls 2008-09-18 21:29 finnish 2008-09-18 21:29 tim_dimm_: people who dont know LA might take that out of context ;) 2008-09-18 21:30 i thought of that as soon as I hit enter 2008-09-18 21:30 esp with the PC bunch we have in here 2008-09-18 21:30 for the record, swingers is a diner 2008-09-18 21:30 shapor: maybe i should tell them about how you behaved when i took you out to boystown in chicago ;) 2008-09-18 21:30 hahah 2008-09-18 21:30 lol 2008-09-18 21:30 tim_dimm_, 802 Broadway? 2008-09-18 21:31 yeah 2008-09-18 21:31 corner of lincoln and broadway 2008-09-18 21:31 shap? 2008-09-18 21:31 sure 2008-09-18 21:31 tim_dimm_, 22 oclock? 2008-09-18 21:31 pick u up 2008-09-18 21:31 ? 2008-09-18 21:31 i could go for a vanilla chai latte 2008-09-18 21:31 kay 2008-09-18 21:31 sure 2008-09-18 21:31 good idea 2008-09-18 21:31 you commie bastards 2008-09-18 21:31 keep those wrists safe tonight 2008-09-18 21:31 i got waffle house 2008-09-18 21:32 :) 2008-09-18 21:32 k rollin in ten 2008-09-18 21:32 you coming by here? 2008-09-18 21:32 shapor: drive by u then flips 2008-09-18 21:32 yeah 2008-09-18 21:32 good 2008-09-18 21:32 sure 2008-09-18 21:32 see you then 2008-09-18 21:32 k 2008-09-18 21:32 got 28 minutes to hack on dleaf 2008-09-18 21:32 bushman, good to meet you 2008-09-18 21:32 ACTION puts pants on 2008-09-18 21:32 nice to talk with everyone 2008-09-18 21:33 swingers, pants 2008-09-18 21:33 bushman, see you soon 2008-09-18 21:33 wtf dude 2008-09-18 21:33 haha 2008-09-18 21:33 ;-) 2008-09-18 21:33 oh if shap is putting pants on... SAY HI TO JOELLE! 2008-09-18 21:33 bushman, we need to meet up 2008-09-18 21:33 yea i know, end of fiscal year madness here, maybe this weekend we'll talk more 2008-09-18 21:33 Bushman: she says hi ;0 2008-09-18 21:33 ;) rather 2008-09-18 21:34 bushman, works for me 2008-09-18 21:35 flips: my boss been just tasked with writing the next orange book like thing, so we can make our requirements whatever we want, literally 2008-09-18 21:36 this is DoD/govt wide stuff, seriously influential development for the next decade, so it's the perfect moment to sneak in all kinds of security goodness 2008-09-18 21:37 bushman, sweet 2008-09-18 21:37 means I'd better bootstrap my clue 2008-09-18 21:38 i get to be the technical ideas feeder, as tehy're more policy, so if you got good ideas, i'm all ears 2008-09-18 21:38 what color is this one going to be? 2008-09-18 21:38 green book? 2008-09-18 21:38 this is la after all 2008-09-18 21:38 nah, the rainbow series been retired, dunno what it's gonna be called 2008-09-18 21:39 leetbook 2008-09-18 21:39 they've realized common criteria was an EPIC FAIL! 2008-09-18 21:39 nice 2008-09-18 21:39 onion book 2008-09-18 21:39 Bushman: isn't that what you spent a year of raduate school pissing and moaning about? 2008-09-18 21:39 or... maybe pomegranite 2008-09-18 21:39 anyway, must sleep, got some hacking certification to pass tommorow 2008-09-18 21:39 pomegrantis have excellent security... isolation... 2008-09-18 21:40 compartmentalization 2008-09-18 21:40 yes, that was the class i was forced into when you were 'visiting' 2008-09-18 21:40 robustness... 2008-09-18 21:40 Bushman: thanks again for that ;) 2008-09-18 21:40 hmm 2008-09-18 21:40 they always pick such meaningful names like dod8200.2 or dcid6/3 2008-09-18 21:42 now , now, now, not all poles drink 2008-09-18 21:42 ...heavily... 2008-09-18 21:42 no? 2008-09-18 21:42 we take breaks 2008-09-18 21:43 true 2008-09-18 21:43 to sleep... 2008-09-18 21:43 of sorts 2008-09-18 21:43 my break been too long, i need my okocim porter damn it 2008-09-18 21:44 orange book on what? 2008-09-18 21:44 security requirements for high assurance computer systems 2008-09-18 21:44 ah 2008-09-18 21:44 govt/mil style stuff 2008-09-18 21:44 http://en.wikipedia.org/wiki/TCSEC 2008-09-18 21:45 that can be interesting 2008-09-18 21:45 red hat and suse got b2 a while back 2008-09-18 21:45 so long as you don't have to write or read it 2008-09-18 21:45 I seem to recall 2008-09-18 21:45 those documents need an interface layer 2008-09-18 21:45 heh, i forgot the official name of it, i actually got the orange covered book on my shelf ;) 2008-09-18 21:45 so did windows nt... for a rather specialized configuration 2008-09-18 21:45 (ie. come with your own personal interpreter) 2008-09-18 21:45 with the network unplugged I think it was 2008-09-18 21:45 well, you need stuff like labeled packets on a network, 2008-09-18 21:46 i totally agree, that's why we're redoing it, cuz everything up to this point sucks 2008-09-18 21:46 isolation 2008-09-18 21:46 yeah, lots of fun 2008-09-18 21:46 maze, I agree with the concept of defining the functionality rather than the interface 2008-09-18 21:46 heh, if you had citizenship i could hire you right now ;) 2008-09-18 21:46 and while you're at it, you should probably make sure to sneak in good performance 2008-09-18 21:46 defining interfaces just doesn't fly with linux kern hacks 2008-09-18 21:47 I'd make sure a decent qos and the like makes it in 2008-09-18 21:47 these people couldnt give less shit about performance 2008-09-18 21:47 barriers 2008-09-18 21:47 that's what they want 2008-09-18 21:47 MaZe: good point.. you really have to sneak in performance 2008-09-18 21:47 I'd want these to be able to interoperate 2008-09-18 21:47 can't get there from here, provably 2008-09-18 21:47 with a public network like the internet 2008-09-18 21:47 the only way to sneak in performance is to push for small code 2008-09-18 21:48 fortunately, performance and provability tend to go hand in hand 2008-09-18 21:48 that's the only spot where security and performance principles meet 2008-09-18 21:48 or make this be the standard for the backbone or something 2008-09-18 21:48 yup 2008-09-18 21:48 because to get performance you need simple 2008-09-18 21:48 less code - easier to understand 2008-09-18 21:48 NOT YOUR CODE! 2008-09-18 21:48 easier to prove correct (or believe correct)/understand 2008-09-18 21:48 Bushman: no US Citizenship, just Canadian ;-) we Canadians rule the world. 2008-09-18 21:49 i've been reading tux' code, holy crap, did you all grow up coding up demos for amigas in the 80ties? 2008-09-18 21:49 imo, that's a requirement for decent coding 2008-09-18 21:49 bushman, you need to read some other filesystem code 2008-09-18 21:49 it's just as dense, but not as performant for the most part 2008-09-18 21:50 i know, i'm kidding, but i saw bitslicing and i went 'oh shnap, they didnt...' 2008-09-18 21:50 the real trick, is you want to define nice clean apis/interfaces, then stick to them without breaking through the layering 2008-09-18 21:50 ACTION thinks about cutting and pasting some vfat code 2008-09-18 21:50 while at the same time avoid layer for the sake of layers - a lot of the vfs code is just wrappers on wrappers - sad 2008-09-18 21:50 bushman, the bitfields stuff will go, mostly 2008-09-18 21:50 can't have that on the actual media 2008-09-18 21:50 and a vital point is, the apis have to be precisely and accurately defined and documented 2008-09-18 21:50 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:51 too much variation between machine architectures 2008-09-18 21:51 yea i was thinking how easy that would be to do buffer overflows on 2008-09-18 21:51 maze, careful there 2008-09-18 21:51 does this function sleep? what's the input? the output? how does it deal with errors? what errors can it return? how long can it take? 2008-09-18 21:51 maze, there is a long and star studded history of api proposals to the linux kernel core that failed 2008-09-18 21:51 hmm? 2008-09-18 21:52 Shapor: outside dude 2008-09-18 21:52 tim_dimm__: k 2008-09-18 21:52 you almost want a language where you can have write constraints for before/after/during execution of each function...like Eiffel 2008-09-18 21:52 selinux only just squeeked in 2008-09-18 21:52 Bushman: agreed 2008-09-18 21:52 that should almost be a gov requirement for the code thats secure 2008-09-18 21:52 most other consortium/thinktank type apis have failed to get merged 2008-09-18 21:52 that's how stock market software is done 2008-09-18 21:52 not to say it can't happen 2008-09-18 21:52 but one has to be _very careful_ 2008-09-18 21:53 careful with what? 2008-09-18 21:53 i got to sit down with the creator of eiffel few months ago, very smart dude 2008-09-18 21:53 maze, proposing apis to linus 2008-09-18 21:53 ah 2008-09-18 21:53 better to propose functionality 2008-09-18 21:53 define the functionality and the invariants 2008-09-18 21:53 where's the problem? he doesnt' like them? 2008-09-18 21:53 oh 2008-09-18 21:53 let linus and friends take it to api 2008-09-18 21:53 yea, linus seems to be a big proponent of order emergent out of chaos... 2008-09-18 21:53 perhaps with some helpful guidance 2008-09-18 21:54 yeah, I kind of consider the api to be the functionality/invariants 2008-09-18 21:54 I'm not sure I'm really aware of the difference 2008-09-18 21:54 maze, he has a healthly disrespect for anybody else's ability to design a robust api 2008-09-18 21:54 linus isn't the world's greatest either, but its his kernel 2008-09-18 21:54 he's not the worst either, or even below 99th percentile 2008-09-18 21:54 ok, by api, I don't mean it can't be changed later - as in stable 2008-09-18 21:55 api for demo apps, sure 2008-09-18 21:55 I mean a layer below which you don't have to descend to understand what it will do 2008-09-18 21:55 yea but how do you arrive at stable without having it in real action for a while 2008-09-18 21:55 reference implementation 2008-09-18 21:55 jsut don' 2008-09-18 21:55 jsut don't let it grow into a huge undertaking with emotional baggage 2008-09-18 21:55 bushman, true 2008-09-18 21:55 well 2008-09-18 21:55 some strange bootstrap 2008-09-18 21:56 usually best to work with incremental modifactions 2008-09-18 21:56 you have to have a clear idea of where you're heading 2008-09-18 21:56 usually you get that the second or third time around the block 2008-09-18 21:56 because you've tried it yourself, yes 2008-09-18 21:56 but that still doesn't cut it with core 2008-09-18 21:56 i was a sysadmin for a long time, if i learned anything is that long term reality beats out fuzzing/use cases anytime ;) 2008-09-18 21:56 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:56 bushman, exactly 2008-09-18 21:56 and the reality is posix 2008-09-18 21:57 I'm not sure what you mean by that, especially by fuzzing 2008-09-18 21:57 yeah and postfix is broken 2008-09-18 21:57 so this will succeed to the extent it builds on that 2008-09-18 21:57 posix is nice because it's a standard 2008-09-18 21:57 but, oh boy, what a standard it is... 2008-09-18 21:57 Flips: be outside in 3 min 2008-09-18 21:57 and because linus cares about it in a backhanded way 2008-09-18 21:57 kay 2008-09-18 21:57 ACTION thinks about pants 2008-09-18 21:58 anybody gonna come pick me up? 2008-09-18 21:58 housecoat currently in case any of you were wondering ;) 2008-09-18 21:58 let anything run in real production environment long enough and it's gonna encounter more bugs than all the test cases you can predict/generate. all tests are contrived. reality is strangely objective 2008-09-18 21:58 maze, we'll send the learjet by in 3 years 2008-09-18 21:58 Bushman: I'm an SRE right now, so I know ;-) 2008-09-18 21:59 what's SRE? 2008-09-18 21:59 bushman, it's not really true of core kernel though 2008-09-18 21:59 -!- tim_dimm__(~mobile@32.172.89.233) has joined #tux3 2008-09-18 21:59 way more bugs are squeezed out before it gets into hands of users 2008-09-18 21:59 or we'd be dead 2008-09-18 21:59 site reliability engineer for google, running crawling and indexing, all the way from machines to 70% or so of the way up the stack 2008-09-18 22:00 that's true, to me kernel is something that joins the clarity of time travel with readability of alchemy ;) 2008-09-18 22:01 well, we used to have the nice 2.odd trees, now all users are beta testers :/ 2008-09-18 22:01 they are also much smaller changes though 2008-09-18 22:01 alrightt, 1am, time to pass out, over an out, great meeting you all 2008-09-18 22:02 nice to meet you 2008-09-18 22:02 and you have the stable kernels as well - the ones in RHEL4/5 and 2.6.16.X and then you have the newer ones being tested in fedora and ubuntu and desktop distros, and then you have bleeding edge in unreleased distros (fedora 10, etc) 2008-09-18 22:02 mainline 2008-09-18 22:03 so this is something I'm not sure about, but I think the trees used by distros have gotten _MUCH_ closer to mainline 2008-09-18 22:03 Flips outside now 2008-09-18 22:03 cu 2008-09-18 22:03 bye guys 2008-09-18 22:04 I used to compile kernels from source back in 2.4 days 2008-09-18 22:05 Mobile irc = busted 2008-09-18 22:05 :) 2008-09-18 22:05 nowadays I use whatever distro provided kernel is available 2008-09-18 22:05 ie. right now I'm running fedora 9 and tracking koji (ie. running 2.6.26.5-42 now) 2008-09-18 22:06 the problem with building your own kernel is it's so freaking complex to get the right config options 2008-09-18 22:06 not to mention you end up with a config noone has tested... 2008-09-18 22:06 and you end up building so many modules you'll never use 2008-09-18 22:07 (make config could really use a detect usb/pci/etc devices present in system and enable those forcibly, disable the rest) 2008-09-18 22:07 I would guess this actually means almost everybody is running a distro provided kernel 2008-09-18 22:11 Thanks! 2008-09-18 23:29 maze, make defconfig is your friend 2008-09-18 23:29 hmm? 2008-09-18 23:29 try it 2008-09-18 23:29 oh, speaking of that, yeah, still 2008-09-18 23:30 I've got a perfectly good kernel someone else deals with 2008-09-18 23:30 and I can compile modules against it 2008-09-18 23:30 I'm happy ;-) 2008-09-18 23:30 you'll get over it 2008-09-18 23:31 that being happy thing 2008-09-18 23:31 these days you just cat your config out of proc 2008-09-18 23:32 cat /proc/config.gz | gunzip | less 2008-09-18 23:32 and lsmod 2008-09-18 23:34 lots of windows peeps reading our mailing list archives 2008-09-18 23:35 chances are, linux hacks running company laptops 2008-09-18 23:35 but you never know 2008-09-18 23:40 or bots trying to be subtle 2008-09-18 23:40 sneakbots 2008-09-18 23:41 flipz_out: do you have stats? 2008-09-18 23:41 maybe 2008-09-18 23:41 installed the stats thing 2008-09-18 23:41 didn't check it 2008-09-18 23:41 what's the command? 2008-09-18 23:41 webalizer 2008-09-18 23:41 it produces html output 2008-09-18 23:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-18 23:42 lame 2008-09-18 23:42 well 2008-09-18 23:42 by default /var/www/webalizer in debian i think 2008-09-18 23:42 maybe 2008-09-18 23:42 got to get it doing something more sensible 2008-09-18 23:42 only giving me per-month right now 2008-09-18 23:43 I want per-hour 2008-09-18 23:43 monthly for may??? 2008-09-18 23:43 wtf 2008-09-18 23:43 Usage Statistics for tux3.org 2008-09-18 23:43 Summary Period: May 2007 2008-09-18 23:43 Generated 18-Sep-2008 23:41 PDT 2008-09-18 23:43 [Daily Statistics] [Hourly Statistics] [URLs] [Entry] [Exit] [Sites] [Referrers] [Search] [Agents] [Locations] 2008-09-18 23:43 Monthly Statistics for May 2007 2008-09-18 23:44 got more important things to do than give enemas to stats scripts 2008-09-18 23:45 1 45 58.44% slashdot.org/comments.pl 2008-09-18 23:47 microsoft seems to be crawling my site with the user-agent 2008-09-18 23:47 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322) 2008-09-18 23:48 ah 2008-09-18 23:48 seen that much 2008-09-18 23:48 from diverse ip addresses 2008-09-18 23:48 evil? 2008-09-18 23:48 its obviously a bot 2008-09-18 23:48 what makes you think its msftbot? 2008-09-18 23:48 its grabbing an html file 2008-09-18 23:48 msnbot 2008-09-18 23:48 and not any of the jpgs linked on it 2008-09-18 23:48 also 2008-09-18 23:49 the ip's belong to msft ;) 2008-09-18 23:49 they're not even good at sneaking 2008-09-18 23:50 65.55.109.0/24 and 65.55.110.0/24 2008-09-18 23:51 OrgName: Microsoft Corp 2008-09-18 23:51 and they set the referrer to 2008-09-18 23:51 http://search.live.com/results.aspx?q=camera 2008-09-18 23:51 to make it look like people are using live.com to find my site 2008-09-18 23:51 seems shady 2008-09-18 23:52 where "camera" is any common word which appears in my site 2008-09-18 23:53 shady 2008-09-18 23:53 ballmer style 2008-09-18 23:53 http://ekstreme.com/thingsofsorts/blogging/yell-if-microsofts-livecom-spammed-you-too 2008-09-18 23:54 with all the subtletly of a giraffe in a japanese tea house 2008-09-18 23:54 this has been going on a long time 2008-09-18 23:58 i guess its them trying to emulate a human 2008-09-18 23:58 seems stupid 2008-09-19 00:02 but can it pogo 2008-09-19 00:15 ... 2008-09-19 00:20 Shapor: # Idle priority is VERY cautious about marking block devices idle. If your foreground tasks are using disk, then your background tasks will become noticeably slower, as they get blocked from touching the disks until Linux knows for sure your foreground tasks have all had a chance at the disk. Most of the times, you don?t care about this anyway, but don?t run a torrent in non-idle class and expect a 20GB copy to finish till the torrent?s done! 2008-09-19 00:20 == lame 2008-09-19 00:20 http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/ 2008-09-19 00:21 got to be a hint why only phb oriiented vendors provide it by default 2008-09-19 00:21 course... maybe that's every vendor ;) 2008-09-19 00:32 yeah its not very fine grained 2008-09-19 00:33 that VERY is a red flag 2008-09-19 00:34 i didn't generate hard data but when i use a combinatino of ionice and nice compressing log files the system seems more responsive 2008-09-19 00:34 than if i dont use them 2008-09-19 00:34 i dont care how long it takes to compress my log files really 2008-09-19 00:35 so if ANY other io wants my disk let it have it 2008-09-19 00:35 if the log compression never finishes thats ok 2008-09-19 00:35 it is useful even in its current state 2008-09-19 00:36 now i just need a version of cat which puts posix_fadvise in the io even loop so it doesn't piss on my buffer cache either 2008-09-19 00:37 cat --dont-piss-on-my-buffer-cache 2008-09-19 00:37 mount -ttux3 -oloop foodev /mnt 2008-09-19 00:37 we start here. 2008-09-19 00:37 wow! we got here 2008-09-19 00:38 my cut n paste of mazes junk just worked 2008-09-19 00:38 not junk 2008-09-19 00:38 junkfs :) 2008-09-19 00:38 junkfs reulz 2008-09-19 00:39 unlike maze, I did not have to reboot my workstation 2008-09-19 00:39 because I ran it under uml 2008-09-19 00:39 worth getting that working 2008-09-19 00:40 heh 2008-09-19 00:40 mount: wrong fs type, bad option, bad superblock on /dev/loop0, 2008-09-19 00:40 or too many mounted file systems 2008-09-19 00:40 (aren't you trying to mount an extended partition, 2008-09-19 00:40 instead of some logical partition inside?) 2008-09-19 00:40 ok, let's get out the junk mop 2008-09-19 00:40 and make it presentable for tux3 checkin #3 2008-09-19 00:41 i thought you were working on extents ;) 2008-09-19 00:41 or #4 if you count the lame git checkin 2008-09-19 00:41 this is related to extents, just as ketchup is related to ice cream 2008-09-19 00:42 maze done good in here 2008-09-19 00:43 I particularly like the little hexdump in 7 lines 2008-09-19 00:43 ACTION cuts it down to 6 2008-09-19 01:00 flipz: where did dir.c come from 2008-09-19 01:00 looks a lot different from fs/ext2/dir.c in a recent kernel 2008-09-19 01:00 dir.c? 2008-09-19 01:00 oh 2008-09-19 01:00 it's the same 2008-09-19 01:01 just marginally cleaned up 2008-09-19 01:01 and got rid of the page wanking 2008-09-19 01:01 when back to buffer ops as god intended 2008-09-19 01:02 changed the interface a bit? 2008-09-19 01:02 not really 2008-09-19 01:02 ext2_create_entry 2008-09-19 01:02 ? 2008-09-19 01:03 pretty much the same 2008-09-19 01:03 that level isn't implemented 2008-09-19 01:03 in tux3 2008-09-19 01:03 well 2008-09-19 01:03 it's in inode.c 2008-09-19 01:03 well i dont see ext2_create_entry in the ext2/dir.c 2008-09-19 01:03 in fact lxr says it doesn't exist 2008-09-19 01:03 try namei.c 2008-09-19 01:03 mknod 2008-09-19 01:03 or something 2008-09-19 01:03 did you rename it? 2008-09-19 01:03 buncha verbosity 2008-09-19 01:04 were they passing in a dentry before? 2008-09-19 01:04 no 2008-09-19 01:04 it's trivial 2008-09-19 01:04 um 2008-09-19 01:04 ok 2008-09-19 01:04 you want to know the name 2008-09-19 01:04 justa sec 2008-09-19 01:05 ext2_add_link 2008-09-19 01:05 dumb name 2008-09-19 01:05 yeah thats what i thought 2008-09-19 01:06 you changed the interface 2008-09-19 01:06 I really didn't change much 2008-09-19 01:06 did not want to discover new bugz 2008-09-19 01:06 because they create the dentry first 2008-09-19 01:06 and pass that 2008-09-19 01:06 rather than a filename 2008-09-19 01:06 hmm I did a little 2008-09-19 01:06 because no dentries in tux3 userspace 2008-09-19 01:06 and they call ext2_create seperately 2008-09-19 01:07 hmm 2008-09-19 01:07 caught me 2008-09-19 01:07 perhaps there should be 2008-09-19 01:07 to make kernel port easier 2008-09-19 01:07 ext2 is not an exemplary model for namespace structure 2008-09-19 01:07 hmm 2008-09-19 01:07 this is all fs internal 2008-09-19 01:07 ok 2008-09-19 01:08 might as well drop some of the braindamage 2008-09-19 01:08 good call on that though 2008-09-19 01:08 i'm just trying to fix a bug in it ;) 2008-09-19 01:08 bug! 2008-09-19 01:08 i think you did introduce one 2008-09-19 01:08 ;) 2008-09-19 01:08 happens 2008-09-19 01:15 feels like there are too many interfaces in ext2/dir.c 2008-09-19 01:15 yes 2008-09-19 01:15 a linux meme 2008-09-19 01:16 making interfaces looks confusingly like productive work 2008-09-19 01:19 now... why did maze put a wait queue inside the bio 2008-09-19 01:19 looking forward to the explanation ;) 2008-09-19 01:19 ACTION unborks 2008-09-19 01:20 it seems like a lot of stuff is landing in our inode.c 2008-09-19 01:20 sposed to put a pointer to the wait queue there, not the wait queue itself 2008-09-19 01:20 sure 2008-09-19 01:20 inode.c is a toilet 2008-09-19 01:20 heh 2008-09-19 01:20 by tradition 2008-09-19 01:20 dont flush it! 2008-09-19 01:20 might lose something good 2008-09-19 01:22 ah so the vfs does indeed hand you a dentry 2008-09-19 01:22 not a filename 2008-09-19 01:22 man lxr is fucking slow 2008-09-19 01:23 i'm going to run my own 2008-09-19 01:23 damn europeans much be awake 2008-09-19 01:23 good luck installing it 2008-09-19 01:23 must even 2008-09-19 01:23 let me know how it works out 2008-09-19 01:23 hrm the interface is kinda crap 2008-09-19 01:24 ACTION tries not getting sidetracked making lxr not suck as much 2008-09-19 01:29 shapor, know a shell command for writing a few bytes at the beginning of a file without truncating the file? 2008-09-19 01:30 reiserfs has some weird looking shit in it 2008-09-19 01:30 you don't say 2008-09-19 01:31 take a simple idea and make it weird 2008-09-19 01:31 flipz: dd ? 2008-09-19 01:31 how bout that shell command? 2008-09-19 01:31 ah 2008-09-19 01:31 didn't know it could do that 2008-09-19 01:31 notrunc i think 2008-09-19 01:33 conv=notrunc 2008-09-19 01:33 lets you plop down data in it without truncating 2008-09-19 01:33 dd conv=notrunc if=hello of=foodev 2008-09-19 01:33 dd has a really weird command syntax 2008-09-19 01:34 root@usermode:~# ./tux3 2008-09-19 01:34 we start here. 2008-09-19 01:34 wow! we got here 2008-09-19 01:34 super = 68 65 6C 6C 6F 0A 00 00 00 00 00 00 00 00 00 00 2008-09-19 01:34 mount: Not a directory 2008-09-19 01:34 with maze's 'art' fixed 2008-09-19 01:35 the number of right things in maze's little hack _vastly_ outnumbers the wrong things 2008-09-19 01:35 but the wrong things are doozers ;) 2008-09-19 01:36 "It is rumored to have been based on IBM's JCL, and though the syntax may have been a joke[1], there seems never to have been any effort to write a more Unix-like replacement." 2008-09-19 01:36 from the wikipedia dd page 2008-09-19 01:36 http://en.wikipedia.org/wiki/Dd_(Unix) 2008-09-19 01:36 :p 2008-09-19 01:36 longest running joke in unix 2008-09-19 01:37 dd deprecated? 2008-09-19 01:37 i think not 2008-09-19 01:37 flipz: we should fix it :) 2008-09-19 01:38 right, if only because we own the name 2008-09-19 01:38 yup 2008-09-19 01:38 ddcp 2008-09-19 01:38 nah too long 2008-09-19 01:38 and its not cp 2008-09-19 01:38 dd --oldbroken 2008-09-19 01:38 dd2 2008-09-19 01:38 dd --muchbetter 2008-09-19 01:39 ddd 2008-09-19 01:39 or how about just "d" 2008-09-19 01:40 dd with a symlink 2008-09-19 01:42 hardlink 2008-09-19 01:42 mandatory 2008-09-19 01:42 provide legacy compatability if the argv[0] is dd 2008-09-19 01:42 otherwise new hawtness 2008-09-19 01:43 root@usermode:~# ./tux3 2008-09-19 01:43 we start here. 2008-09-19 01:43 wow! we got here 2008-09-19 01:43 super = 68 65 6C 6C 6F 0A 00 00 00 00 00 00 00 00 00 00 2008-09-19 01:43 root@usermode:~# mount 2008-09-19 01:43 /dev/ubda on / type ext2 (rw) 2008-09-19 01:43 proc on /proc type proc (rw) 2008-09-19 01:43 devpts on /dev/pts type devpts (rw,gid=5,mode=620) 2008-09-19 01:43 /root/foodev on /mnt type tux3 (rw,loop=/dev/loop0) 2008-09-19 01:43 that's enough for tonight 2008-09-19 01:43 sweet 2008-09-19 01:43 almost ;) 2008-09-19 01:43 so i'm trying to get a backup of your git tree up on github.com 2008-09-19 01:43 how'd you clone it? 2008-09-19 01:43 they already have linus's tree 2008-09-19 01:44 I failed 2008-09-19 01:44 so i forked it 2008-09-19 01:44 always forget how 2008-09-19 01:44 now i'm just trying to push your changes in to it 2008-09-19 01:44 just clone mine 2008-09-19 01:44 i dont think i can 2008-09-19 01:44 don't rebase to anything 2008-09-19 01:44 well 2008-09-19 01:44 I'll fix that 2008-09-19 01:44 tomorrow 2008-09-19 01:44 you need to have the git service running 2008-09-19 01:44 maybe you do 2008-09-19 01:44 I do 2008-09-19 01:44 it's just configged borkly 2008-09-19 01:45 git ui braindamage as much as anything 2008-09-19 01:45 nothing is obvious 2008-09-19 01:45 telnet phunq.net 9418 2008-09-19 01:45 yeah you do 2008-09-19 01:45 mercurial is altogether more usable in this and other ways 2008-09-19 01:46 we should get the whole vfs running in user space 2008-09-19 01:46 would be killer for testing 2008-09-19 01:46 I'm milding interested in doing a dentry like thing 2008-09-19 01:46 but we have fuse for that, really 2008-09-19 01:46 fuse is... 2008-09-19 01:46 ugh 2008-09-19 01:46 we just need to use it better 2008-09-19 01:46 yeah 2008-09-19 01:46 true 2008-09-19 01:46 we're really fitting sideways into it right now 2008-09-19 01:46 yeah its gross 2008-09-19 01:46 I'm amazed anything at all works 2008-09-19 01:47 the bug i was fixing 2008-09-19 01:47 is trying to create a file with a name which is too long 2008-09-19 01:47 returns an error 2008-09-19 01:47 that its too long 2008-09-19 01:47 as it should 2008-09-19 01:47 but creates it anyway 2008-09-19 01:47 oh, bad 2008-09-19 01:47 with the name truncated 2008-09-19 01:47 and no inode 2008-09-19 01:47 naughty 2008-09-19 01:47 its fucked 2008-09-19 01:47 heh 2008-09-19 01:47 I doubt that was my idea 2008-09-19 01:47 its a case you never get in dir.c 2008-09-19 01:48 because it checks when it creates the dentry 2008-09-19 01:48 nice catch 2008-09-19 01:48 in the vfs 2008-09-19 01:48 since we dont use it 2008-09-19 01:48 lamissimo 2008-09-19 01:48 yeah 2008-09-19 01:48 "always check your inputs" 2008-09-19 01:48 yeah 2008-09-19 01:48 that one musta got quietly slipped by ted 2008-09-19 01:48 subtle due to a minor interface change 2008-09-19 01:49 and its not liek dentries have fixed sized strings in d_name 2008-09-19 01:49 so there is no hard maximum 2008-09-19 01:49 its just supposed be be checked 2008-09-19 01:49 i dunno the limit is rediculously short 2008-09-19 01:50 i think 255 bytes maybe 2008-09-19 01:50 why not allow for long filenames 2008-09-19 01:50 that's considered long 2008-09-19 01:51 i suppose 2008-09-19 01:51 silly limitation 2008-09-19 01:51 useful silly limitation 2008-09-19 01:51 of course it comes from wanting to represent the length with a byte 2008-09-19 01:51 so you can have fixed size dentries? 2008-09-19 01:51 fixed size? 2008-09-19 01:52 i'm asking 2008-09-19 01:52 is that the reason? 2008-09-19 01:52 oh i see 2008-09-19 01:52 they certainly aren't fixed size 2008-09-19 01:53 hm one byte lengths 2008-09-19 01:53 true, useful 2008-09-19 01:53 qstr 2008-09-19 01:53 yeah 2008-09-19 01:53 was looking at that earlier 2008-09-19 01:53 len is int 2008-09-19 01:54 it's checked somewhere but you're right 2008-09-19 01:54 ext3 is violating, not checking it 2008-09-19 01:54 ext2 2008-09-19 01:54 __d_path or something 2008-09-19 01:54 playing fast and loosey goosey 2008-09-19 01:55 er no 2008-09-19 01:55 where do the dentries get created? 2008-09-19 01:55 i guess i should look top-down 2008-09-19 01:55 start with sys_open 2008-09-19 01:56 rather than bottom up 2008-09-19 01:56 somewhere in path_walk 2008-09-19 02:01 ah yeah, just got there 2008-09-19 02:01 3440 objp = ____cache_alloc(cache, flags); 2008-09-19 02:01 :p 2008-09-19 02:01 damn thats twisty 2008-09-19 02:01 guy who invented slab also invented zfs 2008-09-19 02:02 eh? 2008-09-19 02:02 course I doubt he wrote four underbars there 2008-09-19 02:02 true 2008-09-19 02:03 this is by way of checking whether kmalloc returns ERR_PTR or just NULL on error 2008-09-19 02:03 seems to be the latter 2008-09-19 02:03 http://lxr.linux.no/linux+v2.6.26.5/fs/namei.c#L869 2008-09-19 02:03 maze naively assumed otherwise, of course maze is right and we are wrong 2008-09-19 02:04 but he have to match linux fart for fart 2008-09-19 02:05 -!- kushal(~kushal@121.246.36.162) has joined #tux3 2008-09-19 02:07 so i got all the way down to http://lxr.linux.no/linux+v2.6.26.5/fs/dcache.c#L1241 2008-09-19 02:07 but i can't find the damn length check 2008-09-19 02:07 let me know ;) 2008-09-19 02:07 try get_name 2008-09-19 02:09 bah better to do this during the day when europeans are sleep and lxr is fast 2008-09-19 02:10 heh 2008-09-19 02:10 you're poking in the right place 2008-09-19 02:10 by the time lxr loads i've lost my train of thought 2008-09-19 02:10 right 2008-09-19 02:10 it would be useful to install it 2008-09-19 02:10 then you can teach me 2008-09-19 02:11 involves postgres & mod_perl 2008-09-19 04:15 -!- kushal_(~kushal@121.246.36.194) has joined #tux3 2008-09-19 05:13 -!- kushal(~kushal@121.246.36.194) has joined #tux3 2008-09-19 07:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 08:34 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-19 09:27 -!- guile(~guile@89-159-217-245.rev.numericable.fr) has joined #tux3 2008-09-19 09:28 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-19 09:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 10:46 -!- Kirantpatil(~kiran@122.167.197.109) has joined #tux3 2008-09-19 10:46 -!- Kirantpatil(~kiran@122.167.197.109) has left #tux3 2008-09-19 11:09 actually whether I had used IS_ERR and ERR_PTR correctly was something else I'd been meaning to ask ;-) 2008-09-19 11:12 flipz: "Ext3cow was designed as a platform for regulatory compliance, and has been used to implement secure deletion, authenticated encryption, and incremental authentication. See the publications page for more details." 2008-09-19 11:13 http://www.ext3cow.com/Publications.html 2008-09-19 11:13 about idle - not lame - idle class is not meant to affect anything else using io, neither b/w wise nor latency wise - hence it has to be very conservative, if that's not what you want... don't use idle (and even then idle can still impact performance of non-idle tasks...) 2008-09-19 11:14 and ionice has more classes than just idle 2008-09-19 11:15 exactly two more 2008-09-19 11:15 although it's not as powerful as it could/should be 2008-09-19 11:15 although i think idle is the most appealing one 2008-09-19 11:16 its common to want to do io intensive tasks in the background like backups or whatnot 2008-09-19 11:16 yeah, using kvm now 2008-09-19 11:16 ionice'ing a kvm session? 2008-09-19 11:16 mind you of course, all my printk's are non-multithreaded-printk compatible - who cares ;-) [for now] 2008-09-19 11:18 I put the wq in the bio, cause I needed something to wait on... was there something else I could wait on, and wake up from the endbio func? 2008-09-19 11:18 uhm, what's wrong with just putting the wq there? what use is the extra level of indirection? 2008-09-19 11:19 If you do install your own lxr - pass links to it ;-) 2008-09-19 11:19 we don't need all kversions 2008-09-19 11:19 dd can write bytes without trunc 2008-09-19 11:20 ah, there it is in the log - still catchingup 2008-09-19 11:20 ;) 2008-09-19 11:20 hey what did you expect... I have no bloody idea what I'm doing ;-) [about the wrong things being doozers] 2008-09-19 11:20 hrm it would be cool if lxr could be tied to a git repo 2008-09-19 11:21 and dd is weird... but it works and is everywhere... 2008-09-19 11:21 or is it already 2008-09-19 11:22 you know what is annoying is the number of clicks you need to do to download anything from sourceforge 2008-09-19 11:22 ah yeah it talks to git 2008-09-19 11:25 yeah I was thinking I should be checking both for errors and null... 2008-09-19 11:26 kvm, and ionice, no was referring to running my tests in kvm, like flips is in uml 2008-09-19 11:27 clicks - yeah agreed 2008-09-19 11:27 Ah, caught up.... 2008-09-19 11:27 seems you guys had a productive night 2008-09-19 11:27 mine was as well 2008-09-19 11:27 first time in a long time that I'm not sleepy before noon 2008-09-19 11:27 i was going to ask what the problem was with putting the wq in the bio as well 2008-09-19 11:28 whats the difference if you put a pointer there 2008-09-19 11:28 well, I need both the wq, and a bool 2008-09-19 11:28 so I put in a pointer to a struct with both 2008-09-19 11:28 (also should probably have an error return field in there) 2008-09-19 11:30 ok, back to work 2008-09-19 11:56 +<----->bio->bi_io_vec[bio->bi_vcnt] = (struct bio_vec){ 2008-09-19 11:56 +<-----><------>.bv_page = virt_to_page(buf), 2008-09-19 11:56 +<-----><------>.bv_offset = offset_in_page(buf), 2008-09-19 11:56 +<-----><------>.bv_len = SB_SIZE }; 2008-09-19 11:56 +<----->bio->bi_size = SB_SIZE; 2008-09-19 11:56 +<----->bio->bi_end_io = end_io_read; 2008-09-19 11:56 +<----->bio->bi_private = &mz; 2008-09-19 11:56 +<----->bio->bi_vcnt = 1; 2008-09-19 11:57 either that should be bio->bi_io_vec[0] = ... 2008-09-19 11:57 or bio->bi_vcnt++; 2008-09-19 11:59 plus putting the wq on the stack is stack bloat - isn't that bad if we want 4k stacks? 2008-09-19 12:00 actually, adding some sort of debug stack depth tracking might be useful. 2008-09-19 12:00 record deepest spot on stack ever hit in your code 2008-09-19 12:00 hmm, maybe the kernel already does that automatically 2008-09-19 12:07 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-19 12:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-19 13:16 maze, ping 2008-09-19 13:17 maze, your ERR_PTR is mostly wrong, where you've applied it to functions that only return ptr or null 2008-09-19 13:17 entirely wrong to be precise ;) 2008-09-19 13:19 maze, putting the work queue in the bio is inherently fragile, the bio can disappear 2008-09-19 13:19 put a pointer to the work queue in the bio 2008-09-19 13:20 right, the wq is in a pointer in the bio 2008-09-19 13:20 yeah checking whether err-ptr is required or not was a todo 2008-09-19 13:20 but of course that's actually not documented anywhere 2008-09-19 13:20 useful, like at the top of said functions 2008-09-19 13:21 the way you've done it, you've got double indirection - I've got single 2008-09-19 13:21 maze, it would be very cool if lxr could be tied to a repo... think about versioned indexes :) 2008-09-19 13:21 heh 2008-09-19 13:21 you either indirect on the wait, or indirect on the complete 2008-09-19 13:22 the wait indirection will be executed more often than the complete 2008-09-19 13:23 bio->bi_vcnt++ would be an improvement 2008-09-19 13:24 the mz goes on the stack anyway 2008-09-19 13:24 if we're that tight for stack space we should not be doing 4K stacks 2008-09-19 13:24 (which people are slowly learning) 2008-09-19 13:31 Iceweasel can't find the server at m.a.z.e.pl. 2008-09-19 13:38 maze, by the way, a wait queue is tiny 2008-09-19 13:38 just a spinlock and a list 2008-09-19 13:38 yeah my university is migrating to a new building 2008-09-19 13:38 why does the mz go on stack if it's kmalloc'ed? 2008-09-19 13:39 my mz wan't kmalloced 2008-09-19 13:39 oh 2008-09-19 13:39 can't see your original code any more 2008-09-19 13:39 good to post that kind of thing to the list 2008-09-19 13:39 it was a nice hack 2008-09-19 13:39 very nice 2008-09-19 13:39 which hack? 2008-09-19 13:39 junkfs 2008-09-19 13:39 oh 2008-09-19 13:40 too bad bio is such a bloaty interface 2008-09-19 13:40 not easy to make useful helpers for it 2008-09-19 13:42 I have ;-) 2008-09-19 13:42 int bioio(int rw, dec_t dev, sector_t sector, unsigned size, 2008-09-19 13:42 endio_t endio, void *private, unsigned vecs, struct page *page, 2008-09-19 13:42 unsigned off, unsigned len); 2008-09-19 13:42 :p 2008-09-19 13:43 yeah, should probably call it something like synchronous_bio_io 2008-09-19 13:44 can shell it as syncbio 2008-09-19 13:44 or synchbio standing for synch[ronous]_b[io]_io 2008-09-19 13:44 don't want sync since that means something else 2008-09-19 13:44 not really 2008-09-19 13:44 it's just a part of a sync 2008-09-19 13:44 well, it won't sync to disk 2008-09-19 13:44 oh, wait it will 2008-09-19 13:44 it will 2008-09-19 13:44 uhm, even if it's an lvm volume? 2008-09-19 13:44 syncbio is the one 2008-09-19 13:45 yes 2008-09-19 13:45 right it will 2008-09-19 13:45 the page cache is above this level 2008-09-19 13:45 things get screwy when virtual block devices cache 2008-09-19 13:45 which some do 2008-09-19 13:45 I'm still not quite clear on how to do barriers and permit reordering in the elevator at this level 2008-09-19 13:45 and theyget screwy 2008-09-19 13:45 barriers are a big mess 2008-09-19 13:46 mostly we just close our eyes and try to do simple things 2008-09-19 13:46 but anyway 2008-09-19 13:46 as you can tell ... I don't do/like simple 2008-09-19 13:46 a barrier is a flag on any bio 2008-09-19 13:46 bad idea actually 2008-09-19 13:46 I like powerful - shoot yourself in the foot things ;-) 2008-09-19 13:46 barrier should be separate bio 2008-09-19 13:46 but a barrier should be more like a pointer to another bio(s) which should be first 2008-09-19 13:46 this write must happen after those writes 2008-09-19 13:46 maybe 2008-09-19 13:47 a new barrier api would be a nice contribution 2008-09-19 13:47 no reason for barriers between fs'es on two partitions on the same bdev 2008-09-19 13:47 current one is teh suck 2008-09-19 13:47 and I don't think you need barriers on read... 2008-09-19 13:47 you do 2008-09-19 13:47 I'll even remember why 2008-09-19 13:47 not badly 2008-09-19 13:47 why? something net related? 2008-09-19 13:47 but its the same as memory ops 2008-09-19 13:48 need all combinations if you look hard enough 2008-09-19 13:48 oh, but then it's a these rights must hit disk before this read 2008-09-19 13:48 s/rights/writes/ 2008-09-19 13:48 that kind of thing 2008-09-19 13:48 barriers between readS? 2008-09-19 13:48 barries between reads... hmm 2008-09-19 13:48 I'd think no 2008-09-19 13:48 probably tackling something at the wrong level 2008-09-19 13:48 I can see writes -> barrier -> writes/reads 2008-09-19 13:49 I don't see reads -> barrier -> reads, nor reads -> barrier -> writes 2008-09-19 13:49 although I guess what exactly should happen if read, write to same sector gets reordered... hmm. 2008-09-19 13:49 the arrow directions are ambiguous 2008-09-19 13:49 arrows pointing out time 2008-09-19 13:49 flow 2008-09-19 13:49 commas do that ;) 2008-09-19 13:50 will we really have to fix the bdev interface first? 2008-09-19 13:50 I can see reads/barrier/writes 2008-09-19 13:50 but not a strong case 2008-09-19 13:50 the bdev barrier interface? 2008-09-19 13:50 yes, it's naive 2008-09-19 13:50 well, and the prio interface 2008-09-19 13:51 there is none 2008-09-19 13:51 get it all kind of nice and usable 2008-09-19 13:51 the prio ideas are just a hack in one elevator option 2008-09-19 13:51 we (I?) need a bdev interface which is aio read/write scatter/gather with priorities htb-like and barriers 2008-09-19 13:52 true 2008-09-19 13:52 be happy to work on it with you 2008-09-19 13:52 a lot of it is there 2008-09-19 13:52 a lot isn't 2008-09-19 13:52 I have plenty of apps 2008-09-19 13:52 starting with media... 2008-09-19 13:53 how exactly barriers should work is an interesting question 2008-09-19 13:53 yes 2008-09-19 13:53 you don't want them to be too strong 2008-09-19 13:53 or awkward 2008-09-19 13:53 but strong enough to implement the consistency the fs needs 2008-09-19 13:53 you want it to solve the primary problem well, which is journalling 2008-09-19 13:53 exactly 2008-09-19 13:54 and it has to take into consideration real world disks 2008-09-19 13:54 and the fact they spin/seek - something to be very aware of when working on this, since impacts priorities much 2008-09-19 13:54 and you might desire consistency x-dev 2008-09-19 13:54 need to write to journal dev before hitting base dev 2008-09-19 13:54 maze, notice there is nothing read-specific about your endio 2008-09-19 13:55 *cute* 2008-09-19 13:55 needs a different name 2008-09-19 13:55 I know 2008-09-19 13:55 hey it was a hack ;-) 2008-09-19 13:55 not any more 2008-09-19 13:55 hehe 2008-09-19 13:55 right 2008-09-19 13:56 maze, I don't know what I was going on about with your bio private pointer, your usage is fine 2008-09-19 13:56 on the stack is more leet 2008-09-19 13:57 kmallocs are bad things 2008-09-19 13:57 I don't know 2008-09-19 13:57 fragment 2008-09-19 13:57 fragile 2008-09-19 13:57 stack is small nowadays 2008-09-19 13:57 not that small 2008-09-19 13:57 yeah, I've wanted to see exactly how much stack space I actually have 2008-09-19 13:57 for my leet new fs idea, I actually need to be very careful 2008-09-19 13:58 sure 2008-09-19 13:58 since with both a net layer and an fs layer it might get tight 2008-09-19 13:58 but this is on the other side of too careful 2008-09-19 13:58 mayhaps 2008-09-19 13:58 I'm still new ;-) 2008-09-19 13:58 you are? 2008-09-19 13:59 I think you're past 50 percentile in hacking skills of people who call themselves that 2008-09-19 13:59 kernel hacking 2008-09-19 13:59 another couple months will get you past 90 2008-09-19 13:59 I still have no idea about anything yet ;-) 2008-09-19 13:59 you think anybody else does? 2008-09-19 13:59 how'd we get all that crap in kernel if anybody had a clue? 2008-09-19 14:00 oh one thing, there are a few null statements in your code that you may not think are there 2008-09-19 14:01 extra semicolons 2008-09-19 14:05 oh, well I like semicolons 2008-09-19 14:05 think every } should be followed by a ; 2008-09-19 14:06 heading to grab lunch 2008-09-19 14:06 (except in } else {) 2008-09-19 14:06 and C/C++ just has bad syntax with semicolon 2008-09-19 14:06 s 2008-09-19 14:07 I assume they'll be gone ;) 2008-09-19 14:08 extra parents and curlies are also frowned at, but extra semicolons are cause for shouting 2008-09-19 14:08 extra parens I mean 2008-09-19 14:08 extra parents are probably ok, particularly in utah 2008-09-19 14:11 let em shout 2008-09-19 14:12 hmm, I wonder 2008-09-19 14:12 is it true that removing a semicolon will either 2008-09-19 14:12 a) result in code functioning the exact same way as before 2008-09-19 14:12 or 2008-09-19 14:12 b) result in a compile failure 2008-09-19 14:13 probably noy 2008-09-19 14:13 extra semicolons make the code more fragile 2008-09-19 14:13 you can get a big surprise if somebody adds a seemingly innocuous conditional 2008-09-19 14:14 in theory there is no effect on generated code 2008-09-19 14:14 in practice, theory and practice are different 2008-09-19 14:15 yeah while I know ';' is a statement seperator, I much prefer to think of them as end-of-statement markers 2008-09-19 14:15 hmm 2008-09-19 14:15 closet pascal groupie ;) 2008-09-19 14:15 maybe I got that backwards 2008-09-19 14:15 well 2008-09-19 14:15 I love pascal syntax 2008-09-19 14:15 soor 2008-09-19 14:15 sorry 2008-09-19 14:16 yes, backwards 2008-09-19 14:16 but still a closet pascaller I think 2008-09-19 14:16 whatever - everything should end with a ';' 2008-09-19 14:16 semicolons are stupid 2008-09-19 14:16 should be optional 2008-09-19 14:16 designers of c are/were stupid 2008-09-19 14:16 but since they are there, have to use them lindentally 2008-09-19 14:16 imho should be required ;-) 2008-09-19 14:17 every line should be required to have two semicolons, one at the beginning, one at the end 2008-09-19 14:17 because you need them anyway, - can't live without em 2008-09-19 14:17 nah 2008-09-19 14:17 every statement should end with a ';' 2008-09-19 14:18 might be at the end of line, might be in the middle, might extend into the next line 2008-09-19 14:18 whitespace shouldn't matter (although could cause compiler warnings) 2008-09-19 14:18 e l s e 2008-09-19 14:19 anway 2008-09-19 14:19 let's not go there ;) 2008-09-19 14:19 else should always be } else { 2008-09-19 14:19 you either have a simple if (blah) something; 2008-09-19 14:19 not considered lindenty to have curlies around single statements 2008-09-19 14:19 or an if () { ... } else { ... }; 2008-09-19 14:19 not saying I think that's good or bad, it's just not lindenty 2008-09-19 14:20 yeah, I know 2008-09-19 14:20 my personal opinion, is: 2008-09-19 14:20 either it's short and sweet fits on a line if (something) something; 2008-09-19 14:21 or should be the full multi-line if () {\n ...\n } else {\n ...\n };\n 2008-09-19 14:21 my personal opinion is, if it's written in C is going to look ugly and there is little you can do about it 2008-09-19 14:21 possibly without the else clause if not needed 2008-09-19 14:21 break your heart trying 2008-09-19 14:21 true 2008-09-19 14:21 folks 2008-09-19 14:22 yes, fixing C is something I've thought of, codenamed 'the language advanced', a curious mix of pascal/c/c++/java/asm/gnu-isms/lisp 2008-09-19 14:22 but besides thinking about it never got anywhere 2008-09-19 14:22 (never tried) 2008-09-19 14:22 bh: hey 2008-09-19 14:22 your brainpower is needed more badly elsewhere ;) 2008-09-19 14:22 hehe 2008-09-19 14:23 but if you write the language first, you can then write the kernel in a language which doesn't suck... 2008-09-19 14:23 you can and nobody will care 2008-09-19 14:23 agreed 2008-09-19 14:23 still an interesting exercise 2008-09-19 14:23 a disappear for years exercise 2008-09-19 14:23 true 2008-09-19 14:24 hence the 'haven't ever tried' part 2008-09-19 14:24 save it for when you're old 2008-09-19 14:24 show those whippersnappers 2008-09-19 14:24 I'm hoping someone else will do it 2008-09-19 14:24 they will 2008-09-19 14:24 or I'll get some smart/bright friends and students to do it 2008-09-19 14:24 there's always somebody with enough time on their hands to write an os from scratch 2008-09-19 14:25 they get 15 minutes of slashdot fame and a nice job where they can stew 2008-09-19 14:25 hehe 2008-09-19 14:25 if/when I go back to school to get my phd, I've been thinking about leading a course for some of the best'n'brightest with design and implementation of a language or os as the topic 2008-09-19 14:26 you can practice here 2008-09-19 14:26 anyway, back to earth 2008-09-19 14:26 you're already TA at tux3u 2008-09-19 14:26 sure 2008-09-19 14:26 got to think about the next level for junkfs/tux3fs 2008-09-19 14:26 right 2008-09-19 14:27 right now I'm trying to think of what I want from the mm subsystem for my fs 2008-09-19 14:27 it's cool how tux3fs is both ramfs and diskfs now, hmm? 2008-09-19 14:27 hehe 2008-09-19 14:27 that's the most instructive thing so far 2008-09-19 14:27 re vfs 2008-09-19 14:28 and at which layer of the vfs (generics for most ops or not) the interfaces need to happen 2008-09-19 14:28 linux kinda gets it right, it's just warty 2008-09-19 14:28 also... error trapping 2008-09-19 14:29 I'd like to see a stack unwinding/resource recovery discipline 2008-09-19 14:29 also wondering if implementing reads by userspace (with appropriate aligned buffer) by unmap and map in ro cow pages from page cache or somewhere else would be appropriate and fast and/or sow 2008-09-19 14:29 slow 2008-09-19 14:29 it would be appropriate even on linux 2008-09-19 14:29 and if there's any race there 2008-09-19 14:29 there are cases where it's slow, but in general it's powerful 2008-09-19 14:29 just linux isn't orgainized that way 2008-09-19 14:30 linux has a loopy approach 2008-09-19 14:30 very naive 2008-09-19 14:30 would get zero copy reads, and most of the time you don't write over that data 2008-09-19 14:30 as in... too many loops 2008-09-19 14:30 yes, and it gets even more fun with net + disk + buffer + both ways 2008-09-19 14:30 exactly 2008-09-19 14:30 hence I'm thinking of this as a two layer fs to begin with 2008-09-19 14:30 that's why nobody has been crazy enough to attempt it 2008-09-19 14:30 look a splice 2008-09-19 14:30 simple thing 2008-09-19 14:31 big disaster 2008-09-19 14:31 I'm not actually sure where splice is atm? 2008-09-19 14:31 what happened there? last I knew there was an exploit and fix and exploit and fix... 2008-09-19 14:31 freesearch lxr 2008-09-19 14:32 it's a feature build on a base of jello 2008-09-19 14:33 there does appear to be a way to get your own page to trigger on all sorts of page operations, so that's good 2008-09-19 14:33 oh yes 2008-09-19 14:33 it's fun 2008-09-19 14:33 meant code not page 2008-09-19 14:33 "stupid page tricks" 2008-09-19 14:34 yeah, but I'm guessing it's needed for decent cache coherency 2008-09-19 14:34 even if it will mean locking will effectively end up being page (not byte-range) based 2008-09-19 14:35 anyway, I figure it's important to know what's possible, to know what can be later implemented, and to design the possibility in from the start 2008-09-19 14:35 you'll be getting into vm soon enough 2008-09-19 14:36 you can help me with the variable page rewrite if you like 2008-09-19 14:36 linus said he would open 2.7 if I did that hack 2008-09-19 14:36 I've realized that the xattr interface can probably be used as a nice ioctl layer for the fs 2008-09-19 14:36 it can? 2008-09-19 14:36 yeah like setting fs.tux3.option = something on an inode 2008-09-19 14:37 and then reading it back 2008-09-19 14:37 have the stuff be auto-generated 2008-09-19 14:37 ioctls would not be pleasant for that 2008-09-19 14:37 and have options for stuff like type of optimizations to be used on this file or etc 2008-09-19 14:37 we have ddlink for that 2008-09-19 14:37 I think xattr is nice here - although haven't looked at ddlink 2008-09-19 14:37 reiser5 ;) 2008-09-19 14:37 hmm? 2008-09-19 14:37 ddlink is cool 2008-09-19 14:38 really cook 2008-09-19 14:38 cool 2008-09-19 14:38 reiser5? what's with reiser4? 2008-09-19 14:38 "dead" 2008-09-19 14:38 is reiser even being worked on? 2008-09-19 14:38 slowly 2008-09-19 14:38 very slowly 2008-09-19 14:39 is reiser 4 done? stable? dropped? 2008-09-19 14:39 quasi stable 2008-09-19 14:40 should be merged, under a different name imho 2008-09-19 14:41 chris mason was one of the big driving forces on reiser, at least reiser 3, and he's entirely devoted to btrfs now 2008-09-19 14:41 which is something like reiser 3.5 2008-09-19 15:00 ah 2008-09-19 15:00 I came up with a few interesting network fs related ideas last night 2008-09-19 15:00 was a very productive bath ;-) 2008-09-19 15:01 works for me too, showers though 2008-09-19 15:01 something about that running water 2008-09-19 15:01 settles the lame ideas, let's bouyant ones float to the top 2008-09-19 15:11 maze, 2008-09-19 15:11 while (vecs--) 2008-09-19 15:11 bio->bi_io_vec[bio->bi_vcnt++] = va_arg(args, struct bio_vec); 2008-09-19 15:13 int bio(int rw, dev_t dev, sector_t sector, bio_end_io_t endio, void *private, unsigned vecs, ...) 2008-09-19 16:57 what are you folks going to finished the file system ? 2008-09-19 16:57 what=when 2008-09-19 16:57 are were there yet ? 2008-09-19 16:58 ACTION grins 2008-09-19 17:31 gregkh is an idiot 2008-09-19 17:32 oh was that public 2008-09-19 17:32 http://dustinkirkland.wordpress.com/2008/09/18/whats-behind-gregkhs-latest-rant/ 2008-09-19 17:32 wouldn't be so bad if he could design, code or debug 2008-09-19 17:33 bh, we're getting closer 2008-09-19 17:33 the kernel port is getting a little attention 2008-09-19 17:33 needs a lot more 2008-09-19 17:56 true fact: the linux kernel makefile is 1600 lines long 2008-09-19 18:01 541 KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) <- this is the line we kill to enable inline decls 2008-09-19 18:01 I guess we are going to do taht 2008-09-19 18:01 for now until just before merge 2008-09-19 18:09 sk8 oclock 2008-09-19 18:09 one could say sk8teen oclock 2008-09-19 18:18 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 18:26 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 19:02 -!- BSD(~bandan@70-4-203-156.area3.spcsdns.net) has joined #tux3 2008-09-19 19:46 -!- BSD(~bandan@68-244-245-217.area3.spcsdns.net) has joined #tux3 2008-09-19 19:57 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-19 20:42 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-19 20:51 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-19 21:20 Results 1 - 10 of about 283,000 for tux3. 2008-09-19 21:20 up 100k in a day 2008-09-19 21:20 wonder what happened 2008-09-19 21:21 100k...hits? 2008-09-19 21:21 musta been the waking post 2008-09-19 21:21 100k up in one day, yes 2008-09-19 21:21 damn 2008-09-19 21:21 the internet loves wanking I guess 2008-09-19 21:21 hey, did you guys want a faster lxr? 2008-09-19 21:21 very much 2008-09-19 21:22 ok, i just got one going on my home quad, i gotta tweak postgres and we should be ready to rock 2008-09-19 21:22 excellent 2008-09-19 21:22 of course bandwidth might be a problem 2008-09-19 21:22 your admin skillz rock 2008-09-19 21:22 shapor can fix that 2008-09-19 21:22 shap can fix anything, that bastard! :) 2008-09-19 21:23 truth 2008-09-19 21:23 do you need the free text searches? that's some extra software that i'd have to get/configure 2008-09-19 21:23 oh yes 2008-09-19 21:23 the whole enchilada 2008-09-19 21:23 freetext is essential 2008-09-19 21:23 ok, i'll go play with that 2008-09-19 21:24 thanks much 2008-09-19 21:24 have to run out before whole foods closes 2008-09-19 21:24 didnt know, i just started dicking around with it to remind myself what sysadmining was like in linux ;) 2008-09-19 21:24 or I don't get my sushi tonight 2008-09-19 21:24 go get sushi, that's a moral imperative 2008-09-19 21:24 it's a mess, isn't it. LXR install I mean 2008-09-19 21:24 it aint pretty, but than again i dont do normal...anything 2008-09-19 21:25 that's a good sign 2008-09-19 21:25 bbiaf 2008-09-19 21:26 lol 2008-09-19 21:43 Bushman: what kind of bandwidth do you have? 2008-09-19 21:43 ACTION puts head down in shame 2008-09-19 21:43 cable modem 2008-09-19 21:44 its good enough 2008-09-19 21:44 most requests are only a few 10's of kb probably 2008-09-19 21:44 i can 'pimp my apache' and turn on compression since it's text it should be alright 2008-09-19 21:44 by looking at the code cpu is gonna be the bottle neck first ;) 2008-09-19 21:45 yea the db design makes baby jesus cry 2008-09-19 21:45 so i gotta go through postgres configs first, which can take hours to do right, that shit is complicated 2008-09-19 21:46 i'm i can get a dual Dual Core Woodcrest 2008-09-19 21:46 for $120/mo 2008-09-19 21:46 i might do that 2008-09-19 21:47 since i'm finally more than breaking even with my current servers 2008-09-19 21:47 the database for a whole 2.6.26.5 tree is about 1.1gb, so i'll just shove the whole thing into memory 2008-09-19 21:47 how much ram does it have? 2008-09-19 21:47 braking even? what are you hosing? 2008-09-19 21:47 6gb ;) 2008-09-19 21:47 a few sites 2008-09-19 21:48 it's my Matlab cruncher ;) 2008-09-19 21:48 well a few paying sites 2008-09-19 21:48 persiankitty is back? :) 2008-09-19 21:48 haha 2008-09-19 21:48 then various freebees like zumastor.org 2008-09-19 21:49 Bushman: you installed lxrng right? 2008-09-19 21:49 thats the one running on lxr.linux.no 2008-09-19 21:49 lxrng? i just found lxr-devel 2008-09-19 21:49 i think that ones out of date 2008-09-19 21:49 it's like 0.9.5 2008-09-19 21:49 but if it works thats cool 2008-09-19 21:50 there are annoying bugs in the one running on lxr.linux.no 2008-09-19 21:50 like it will show the same result multiple times 2008-09-19 21:50 see http://lxr.linux.no/ at the bottom of the page 2008-09-19 21:50 step by step instructions for setting it up too ;) 2008-09-19 21:50 well i havent seen the web part of it yet, the first run through the code indexing just ended like 10 mins ago 2008-09-19 21:51 ah cool 2008-09-19 21:51 took a while 2008-09-19 21:51 too bad i dont have any phat hardware at home 2008-09-19 21:51 i have 15/15 fios 2008-09-19 21:51 should i ship you a box? :) 2008-09-19 21:52 how many watts ? ;) 2008-09-19 21:52 if i were to guess it probably sounds like a helicopter 2008-09-19 21:52 never knew you to have an quiet boxes 2008-09-19 21:52 ups says about 90 sitting idle with cpu frequency throttling, 140+ at full boogie 2008-09-19 21:53 so 100 on average 2008-09-19 21:53 it's a bulb 2008-09-19 21:53 so about $8/mon 2008-09-19 21:54 are you all green, or just being a cheapass? 2008-09-19 21:54 little from column a... 2008-09-19 21:54 dont have to answer, i know the answer ;) 2008-09-19 21:54 hm the place i used to live had free electricity and fios available 2008-09-19 21:55 should build a datacenter in it 2008-09-19 21:55 or a grow house ;) 2008-09-19 21:55 ok u turn 2008-09-19 21:55 is that a 'weeds' reference? 2008-09-19 21:55 get me a prius 2008-09-19 21:56 so i can sneak up on mofos real quiet 2008-09-19 21:56 'so i can sneak up on motherfuckers' 2008-09-19 21:56 hah 2008-09-19 21:56 that's a bit spooky 2008-09-19 21:56 annnyway 2008-09-19 21:57 bandwidth shouldnt be a problem 2008-09-19 21:57 if it is, ship it to me ;) 2008-09-19 21:57 cant we just set it up on one of your real servers? 2008-09-19 21:57 sure 2008-09-19 21:57 but i dont want to kill the cpus 2008-09-19 21:57 we'll see how it goes on cable modem 2008-09-19 21:57 i can put it up on marcintology, but that's just a normal shared web account 2008-09-19 21:57 i can run dyndns for you if you dont have it already 2008-09-19 21:58 oh i got dns for it ;) 2008-09-19 21:58 lxr.tux3.org ? 2008-09-19 21:58 ooh could we pull from ddtree too? 2008-09-19 21:58 ok, let's do that than 2008-09-19 21:58 that would be slick! 2008-09-19 21:58 flipz: you like that idea? 2008-09-19 21:58 well first i wanna see if i can get it working nicely 2008-09-19 21:58 have the ddtree lxr'ed ? ;) 2008-09-19 21:58 flipz is doing sushi 2008-09-19 21:59 ah ok well i'm out for a bit too 2008-09-19 21:59 i sushi'ed m'self for lunch yesterday so i should be good for a day or two before i'm gonna start jonsing again 2008-09-19 21:59 good, i cant work with you making me laugh 2008-09-19 22:08 http://blogs.pcworld.com/staffblog/archives/007783.html 2008-09-19 22:08 take a look at the windows ad at the bottom... 2008-09-19 22:09 some enlightened folk at microsoft snuck in penguins... 2008-09-19 22:09 they're everywhere 2008-09-19 22:12 if needs be I can probably stick lxr on an athlon64 at my univ in poland 2008-09-19 22:12 or, maybe I could host a second copy from home off of comcast 2008-09-19 22:13 ddtree.tux3.org 2008-09-19 22:14 no functionen 2008-09-19 22:14 Host ddtree.tux3.org not found: 3(NXDOMAIN) 2008-09-19 22:15 it was just a suggestion 2008-09-19 22:15 we can make it happen any time 2008-09-19 22:15 haha 2008-09-19 22:16 nice redmond penquins 2008-09-19 22:16 are they that clueless or are there subversives inside m$? 2008-09-19 22:19 I'm guessing subversion from inside 2008-09-19 22:19 but it looks like windows out onto the future to me, penguins everywhere 2008-09-19 22:22 ugh, reading a notebook review and someone is claiming gigabit wired is overkill... 2008-09-19 22:22 what the hell are they drinking? 2008-09-19 22:22 or smoking 2008-09-19 22:23 "The thing that gets us out of bed every day is the prospect of creating pathways above, around and through walls." msft marketdroid 2008-09-19 22:23 sheesh 2008-09-19 22:23 I'm glad I'm not him 2008-09-19 22:23 or them 2008-09-19 22:23 what gets me out of bed is sheer effort of will 2008-09-19 22:23 and the prospect of some french roast 2008-09-19 22:25 heh, for me it's usually the buzzer and a sense of duty 2008-09-19 22:25 when the family gets back tomorrow it will be a four year old jumping on me 2008-09-19 22:26 "time to get up and play daddy" 2008-09-19 22:27 "An approach dedicated to engineering the absence of anything that might stand in the way" -- my gawd who came up with that one, ballmer? 2008-09-19 22:27 ugh 2008-09-19 22:27 engineering the absence - new msft slogan 2008-09-19 22:31 i'm all for engineering the absence 2008-09-19 22:31 of msft 2008-09-19 22:32 is the stupid question hour still on? 2008-09-19 22:32 yep 2008-09-19 22:33 vm.swappiness, wanna explain it to me, what's it really do, what rules of thumb i wanna use to determine it, etc 2008-09-19 22:33 oh, one of those 2008-09-19 22:34 it's an andrewism 2008-09-19 22:34 did i pick a touchy topic? :) 2008-09-19 22:34 take a vm that just plain doesn't swap very well and bolt knobs on it 2008-09-19 22:34 akpm is a friend 2008-09-19 22:34 but the linux vm has been dire for a few years 2008-09-19 22:34 `http://kerneltrap.org/node/3000 2008-09-19 22:34 swappiness is one of the attempted bandaids 2008-09-19 22:36 what's wrong with knobs? better to have them than to not have them, isnt it? 2008-09-19 22:36 better to have it work 2008-09-19 22:36 than to give up and offer a knob which also doesn't work 2008-09-19 22:37 like those old televisions 2008-09-19 22:37 you had a color control that ranged from "very green" to "very red" with "very blue" in between 2008-09-19 22:38 anyway 2008-09-19 22:38 I'm taking a sabbatical from vm 2008-09-19 22:38 so I am licensed to throw turds 2008-09-19 22:42 Bushman: the vm operates in a delicate balance right now with knobs pulling in 4 dimensions 2008-09-19 22:42 this breaks more than you think 2008-09-19 22:43 i've seen memory recursion deadlocks with AoE and ddsnap 2008-09-19 22:43 as well as other wacky behavior like: 2008-09-19 22:43 http://pengaru.com/~swivel/pop_comparisons/04-26-2006/ 2008-09-19 22:44 I don't even want to think about the vm 2008-09-19 22:44 let's stick with fixing fs and bdev 2008-09-19 22:44 Vito sounds like he could use to do some PCA to reduce the dimensionality in his quest for performance 2008-09-19 22:46 its not a lot to ask of a modern server to be able to parse silly protocols like pop at wire speed 2008-09-19 22:46 and cache some stuff 2008-09-19 22:47 thats a case of the os clearly pissing in your cheerios 2008-09-19 22:48 evicting >2GB of buffer cache at a time due to brain damage in the vm 2008-09-19 22:49 shapor: you're spoiled, you need to work with windows for a while ;) 2008-09-19 22:49 no thanks 2008-09-19 22:49 once i reinstalled a SCSI driver and all my fonts went bold 2008-09-19 22:49 wanna explain that one 2008-09-19 22:51 mmm the nigiri was fine, now time to look into the sake issue 2008-09-19 22:51 lol 2008-09-19 22:52 bushman, use after free? 2008-09-19 22:52 in the registry? 2008-09-19 22:53 that was a long time ago, i dont remember. i saw that and i thought i had some bad sushi or something ;) 2008-09-19 22:55 i'm still working on the freetext indexing, do we need just straight text/html parsing, or do we want more like PDF or .doc? 2008-09-19 22:55 there's no such thing as bad sushi 2008-09-19 22:55 careful there ;) 2008-09-19 22:55 MaZe: let it sit out in the sun for few hours, then you'll experience bad sushi 2008-09-19 22:56 bushman, I didn't know lxr had that knob 2008-09-19 22:56 but straight text, yes 2008-09-19 22:56 it's not lxr, it's swish, the text indexer 2008-09-19 22:56 right, so lxr doesn't recommend a config for swish? 2008-09-19 22:56 I thought they did 2008-09-19 22:57 kinda, but if i'm doing it by hand, might as well pimp it out a bit 2008-09-19 22:57 anyway, text 2008-09-19 22:58 plus Shap tells me that i grabbed a slightly different code than what the norwiegian lxr site is running 2008-09-19 22:58 remember when it was glimpse? 2008-09-19 22:58 ıʞsÊoʞʎzÉ”uÇż Ë™É É¾ÇıɔÉɯ - who can read this? 2008-09-19 22:58 a kinda sort open indexer 2008-09-19 22:58 university project 2008-09-19 22:58 the lxr manual says glimpse doesnt support all the functions it wants, so i went with swish 2008-09-19 22:58 they tried to close it up and make some dough, nobody used it, finally somebody wrote swishe and nobody remembers glimpse 2008-09-19 22:59 i've used this russian indexer/search engine called mnogosearch before, if i'm really bored i might see if i can use that here 2008-09-19 23:00 one of the things we plan to get happening with tux3 is proper incremental indexintg 2008-09-19 23:04 why are we leaking the sb inode map in inode.c test? 2008-09-19 23:04 ==31367== 8,160 (8,040 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 7 2008-09-19 23:04 ==31367== at 0x4A1B858: malloc (vg_replace_malloc.c:149) 2008-09-19 23:04 ==31367== by 0x401E44: new_map (buffer.c:452) 2008-09-19 23:04 ==31367== by 0x40A088: new_inode (inode.c:128) 2008-09-19 23:04 ==31367== by 0x40B53C: make_tux3 (inode.c:493) 2008-09-19 23:04 ==31367== by 0x40BA21: main (inode.c:554) 2008-09-19 23:04 because it's broken ;-) 2008-09-19 23:04 shapor, we want to know 2008-09-19 23:05 does seem like the test is broken 2008-09-19 23:05 doesn't* 2008-09-19 23:06 want hints to track it down? 2008-09-19 23:06 put exit(1) somewhere and see if you put it before or after the leak 2008-09-19 23:06 smrt 2008-09-19 23:06 question: 2008-09-19 23:06 if a page-fault gets triggered 2008-09-19 23:07 due to the page being not present or write to read-only page 2008-09-19 23:07 from user space 2008-09-19 23:07 what context do you end up in the kernel? is that considered process context or interrupt context? 2008-09-19 23:07 process 2008-09-19 23:07 but not process 2008-09-19 23:07 it's non-interrupt kernel 2008-09-19 23:08 lol, so what can/can you not do - how does it differ from process and interrupt? 2008-09-19 23:08 can you sleep? 2008-09-19 23:08 any doc pointers? 2008-09-19 23:08 yes, you have to in order to read in the page 2008-09-19 23:08 right - fair point 2008-09-19 23:09 and of course another thread can trigger the same page fault again before you read it in, so you need to lock appropriately 2008-09-19 23:09 or that might happen automagically 2008-09-19 23:09 http://lxr.linux.no/linux+v2.6.26.5/arch/x86/mm/fault.c 2008-09-19 23:09 so how many different types of contexts do we have? 2008-09-19 23:10 process? kthread? fault handler? interrupt? anything else? 2008-09-19 23:10 wtf if i put an exit(1) right before the return 0; in main it doesn't detect a leak 2008-09-19 23:10 try exit(0)? 2008-09-19 23:10 does it always leak the same way? 2008-09-19 23:10 maybe instead of exit(1) do goto return 0 at end of main? 2008-09-19 23:11 same 2008-09-19 23:11 cant you printf a bunch of usual suspects? 2008-09-19 23:11 exit(0) also no error 2008-09-19 23:11 maze, http://lxr.linux.no/linux+v2.6.26.5/Documentation/exception.txt#L266 2008-09-19 23:11 maybe it bypasses the check code? 2008-09-19 23:11 i'll try moving the return up instead 2008-09-19 23:12 right that's the dealing with exceptions in kernel space 2008-09-19 23:12 shapor, right, valgrind does that 2008-09-19 23:12 and using them to detect unauthorized reads, etc 2008-09-19 23:12 maze, process, kthread and fault handler are all the same 2008-09-19 23:12 shap: you got ida pro handy? 2008-09-19 23:13 oh really? so basically there's just 2: process/kthread/fault vs interrupt 2008-09-19 23:13 the only real difference with process is it has a user address space to work with 2008-09-19 23:13 mm 2008-09-19 23:13 that's actually just a flag bit 2008-09-19 23:13 and the doc you referred to is about faults triggered from the kernel 2008-09-19 23:13 like having a file table 2008-09-19 23:13 ah 2008-09-19 23:14 right, but since we don't have any pointers passed in as parameters from userspace that doesn't really much matter 2008-09-19 23:14 maze, the doc tells you about do_page_fault 2008-09-19 23:15 that it gets called? 2008-09-19 23:15 I know that ;-) 2008-09-19 23:15 but do_page_fault does special stuff if the page fault got triggered with eip in kernel space 2008-09-19 23:16 ah i see whats going on 2008-09-19 23:16 although I guess we can trigger page fault-ins for user pages from kernel space as well 2008-09-19 23:17 with the leak 2008-09-19 23:17 so the two cases shouldn't really differ 2008-09-19 23:17 how would that change anythign? if you need to get a page, gotta fetch it, regardless where EIP is pointing at, isnt it? 2008-09-19 23:18 we don't support faults in kernel space 2008-09-19 23:18 you don't want to trigger sigsegv from kernel space 2008-09-19 23:18 only object is to oops properly 2008-09-19 23:18 used to panic on that 2008-09-19 23:18 you return EFAULT instead 2008-09-19 23:18 ok so how do you prevent that? 2008-09-19 23:19 wait, so what happens if userspace syscall writes and the buffer I passed in is swapped out? 2008-09-19 23:19 I'd always assumed the page-ins would happen via fault, do we actually map the memory in in some other way? 2008-09-19 23:23 maze, ok it took me a while to remember 2008-09-19 23:23 when a fault occurs, unlike an interrupt it isn't interrupting some random process 2008-09-19 23:23 right 2008-09-19 23:23 it faults in the process that needs to work, so the fault handler uses that context 2008-09-19 23:24 it just has to do a little register fiddling and play with the intruction pointer 2008-09-19 23:24 in some cases, parsing the instruction stream 2008-09-19 23:24 dimly remembering this from the last time I did it, many years ago 2008-09-19 23:24 fault semantics on x86 are utter crap 2008-09-19 23:24 so basically a fault triggered from userspace is almost like a syscall 2008-09-19 23:24 yes 2008-09-19 23:25 besides the fact it can happen anywhere, and needs some special asm on entry/exit to deal with the weird semantics 2008-09-19 23:25 http://lxr.linux.no/linux+v2.6.26.5/arch/sh/kernel/cpu/sh5/entry.S#L1134 2008-09-19 23:25 for example 2008-09-19 23:25 how about a fault triggered on access from within the kernel? 2008-09-19 23:26 oops if you're lucky 2008-09-19 23:26 panic if not 2008-09-19 23:26 and what if its triggered from irq context? 2008-09-19 23:26 death 2008-09-19 23:26 try it ;) 2008-09-19 23:26 it's easy 2008-09-19 23:26 so how does the kernel guarantee the userspace memory accessed by syscalls is present? 2008-09-19 23:27 it goes delving into page table entries and things 2008-09-19 23:27 do we grab and release locks on the userspace memory before and after the copy (and if not present call the pagein handlers manually?) 2008-09-19 23:27 well 2008-09-19 23:27 it doesn't need to be present 2008-09-19 23:27 because it can fault 2008-09-19 23:27 huh? 2008-09-19 23:27 well 2008-09-19 23:27 sorry 2008-09-19 23:27 not in kernel ;) 2008-09-19 23:27 double huh 2008-09-19 23:27 it does the fault by hand 2008-09-19 23:27 see get_user_pages 2008-09-19 23:28 right, so like I said above at 11:27:10 2008-09-19 23:28 don't have timestamps on 2008-09-19 23:28 (11:27:10 PM) MaZe: do we grab and release locks on the userspace memory before and after the copy (and if not present call the pagein handlers manually?) 2008-09-19 23:29 we don't grab locks on user memory 2008-09-19 23:29 not sure what the question is 2008-09-19 23:29 not on struct page *? 2008-09-19 23:29 no 2008-09-19 23:29 not for that 2008-09-19 23:29 we take ref counts 2008-09-19 23:29 ain't that the same thing? 2008-09-19 23:29 nonzero ref count holds a page in memory 2008-09-19 23:29 no 2008-09-19 23:30 but it prevents the page from disappearing from under us? right? so it's like a lock - except others can access it as well, right 2008-09-19 23:30 see why lock is an inappropriate name here 2008-09-19 23:30 I meant lock in the sense of lock into ram 2008-09-19 23:30 it's not at all like a lock 2008-09-19 23:30 it's a refcount 2008-09-19 23:31 a lock is a serializer, and recount is a don't kill me 2008-09-19 23:31 http://lxr.linux.no/linux+v2.6.26.5/mm/memory.c#L962 <- follow_page, doing the job of mm hardware by hand 2008-09-19 23:32 does refcount = 0 immediately result in memory being destroyed? 2008-09-19 23:32 yes 2008-09-19 23:32 or will it only get evicted if need be at that point? 2008-09-19 23:32 now... depends whether it's anon or not 2008-09-19 23:33 anon has to be swept up, page cache is immediately freed at that point 2008-09-19 23:33 don't quote me, I used to hack that stuff ;) 2008-09-19 23:33 but it's been a while 2008-09-19 23:33 so normally a processes mapping holds a refcount on it's memory? 2008-09-19 23:34 but that doesn't work 2008-09-19 23:34 check __free_page 2008-09-19 23:34 there have to be 2 layers here 2008-09-19 23:34 one virtual - what a process needs, one physical - what's in memory 2008-09-19 23:34 that puts the page back on the buddy as soon as count hits zero I believe 2008-09-19 23:34 probaby vma vs page 2008-09-19 23:34 so... treatment of inodes by the vfs is rather different 2008-09-19 23:35 there is one refcount on a page for each pointer to it, basically 2008-09-19 23:35 including one for the lru, I tried to implement, andrew never took the patch 2008-09-19 23:35 but definitely one for the page cache 2008-09-19 23:35 and one for each page table entry pointing at it 2008-09-19 23:35 there aren't really two layers in linux 2008-09-19 23:35 unlike freebsd 2008-09-19 23:36 it's all one layer 2008-09-19 23:36 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-19 23:36 vma just specifies some access rights to memory regions 2008-09-19 23:37 hmm 2008-09-19 23:37 you have lots of time to get that sorted 2008-09-19 23:37 has no real impact on fs development 2008-09-19 23:39 am I right in assuming that during kernel startup 2008-09-19 23:39 a fragment of physical memory is reserved for a big honking array 2008-09-19 23:39 of struct pages - one per page of physical memory in the system? including high-mem and all that? 2008-09-19 23:40 possibly with some hacks for discontig mem 2008-09-19 23:40 yes 2008-09-19 23:40 it's rather crude 2008-09-19 23:40 and struct page as struct address_space * mapping in it 2008-09-19 23:40 s/as/has/ 2008-09-19 23:41 which appears to be inode related 2008-09-19 23:41 very much so 2008-09-19 23:41 we make every page header have that field even if its anon where the field has no use 2008-09-19 23:42 how big is a struct page - around 50 bytes? 2008-09-19 23:42 address_space, mapping, and page cache are different names for the same thing by the way 2008-09-19 23:42 stupidly sloppy terminology 2008-09-19 23:42 less I think 2008-09-19 23:43 it's been heavily sqzd 2008-09-19 23:43 maybe 50 on 64 bit 2008-09-19 23:43 so we basically throw 1% of memory out for accounting purposes. 2008-09-19 23:43 go into junkfs and printf(... sizeof(struct page)) 2008-09-19 23:43 much more than that 2008-09-19 23:44 and that's not even including the cpu pagetables 2008-09-19 23:44 dentry and inode cache are really extravagant 2008-09-19 23:44 it's not lean and mean 2008-09-19 23:44 only compared to even worse kernels 2008-09-19 23:45 56 2008-09-19 23:45 64 bit, right? 2008-09-19 23:45 yes 2008-09-19 23:46 multiply 56 times 1 TB / 4096 2008-09-19 23:46 kind of wicked how you can use junkfs as a code injector 2008-09-19 23:46 that's the point 2008-09-19 23:46 side door into the kernel for dodgy people 2008-09-19 23:46 14 GB 2008-09-19 23:46 so... imagine the suck when we scan that 2008-09-19 23:47 the sound of sucking is the only thing you hear from that computer 2008-09-19 23:47 scan it for what? 2008-09-19 23:47 anything 2008-09-19 23:47 freeable memory 2008-09-19 23:47 why would we want to scan it? 2008-09-19 23:47 oh 2008-09-19 23:47 so there's no heap structure of memory or anything like that 2008-09-19 23:47 nope 2008-09-19 23:47 it's the crudest imaginable system 2008-09-19 23:47 oh, that is indeed quite a vacuum 2008-09-19 23:48 while 1 TB is still rare 2008-09-19 23:48 it was only recently that linus allow vma to be a tree isntead of a linear list 2008-09-19 23:48 32-128G is perfectly reasonable nowadays 2008-09-19 23:49 and at that point we have almos 1-2G of struct page's 2008-09-19 23:49 1 tb is right around the corner 2008-09-19 23:49 argh 2008-09-19 23:49 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-19 23:49 so basically another part that wasn't designed 2008-09-19 23:49 indeed 2008-09-19 23:50 that comment applies to almost all the parts 2008-09-19 23:50 s/almost// 2008-09-19 23:50 do other kernels get this even worse? 2008-09-19 23:50 yes 2008-09-19 23:50 unbelievable? 2008-09-19 23:50 yes 2008-09-19 23:50 but true 2008-09-19 23:50 possible exception of, oh, qnx 2008-09-19 23:51 I'd always assumed the kernel to be this awesome C/assembler layer of wicked algos and data structures 2008-09-19 23:51 haha 2008-09-19 23:51 welcome, you're a kernel hacker now 2008-09-19 23:51 evrything optimized and tuned to hell and back 2008-09-19 23:51 to hell, not back 2008-09-19 23:51 lol 2008-09-19 23:52 some bits are ok 2008-09-19 23:52 yeah 2008-09-19 23:52 some bits are pretty damm amazing 2008-09-19 23:52 and I'm assuming it is in general getting better over time 2008-09-19 23:52 but most bits are just plain crap 2008-09-19 23:52 hard to say 2008-09-19 23:52 it's getting bigger 2008-09-19 23:53 yes, I've noticed 2008-09-19 23:53 I'm not sure its getting faster, seems to be regressing a little 2008-09-19 23:53 but I've assumed that hasn't been core functionality 2008-09-19 23:53 more just new drivers 2008-09-19 23:53 new filesystems 2008-09-19 23:53 etc 2008-09-19 23:53 also core 2008-09-19 23:53 all the big iron stuff 2008-09-19 23:53 from sgi and ibm 2008-09-19 23:54 hmm 2008-09-19 23:54 buffer.c and filemap.c get longer and longer 2008-09-19 23:54 so 2.7 happens when we get rid of bh? 2008-09-19 23:54 things like mpage.c appear 2008-09-19 23:54 linus said the variable sized page patch would be enough to open 2.7 2008-09-19 23:54 that was a while ago 2008-09-19 23:54 thing is, I'm not sure I would want 2.7 to happen 2008-09-19 23:54 2.5 was an utter mess 2008-09-19 23:55 don't know how much he treats an email as a promise ;) 2008-09-19 23:55 and 2.6. up to 2.6.7 or so was junk 2008-09-19 23:55 going through that again would be painful 2008-09-19 23:55 2.6 is kinda starting to stink 2008-09-19 23:55 it was fresh and new once 2008-09-19 23:56 well 2008-09-19 23:56 it's different 2008-09-19 23:56 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-19 23:56 in 2.3/4/5 we had a desparate situation 2008-09-19 23:56 nobody had a kernel that worked properly 2008-09-19 23:56 otoh, it does seem like there's a lot of deep in the core stuff that should be changed 2008-09-19 23:56 complaints about paging artifacts every day on lkml 2008-09-19 23:57 for years 2008-09-19 23:57 that was bad 2008-09-19 23:57 it just didn't work 2008-09-19 23:57 it's better now, where 2.6 may suck but it works 2008-09-19 23:57 that's a good base to step out and do some housecleaning 2008-09-19 23:57 how much of this is sucky code, or bad algos, and how much is it just being np-complete or unsolvable problems 2008-09-19 23:58 most of it is just sucky code 2008-09-19 23:58 nearly all 2008-09-19 23:58 we know about impossibility 2008-09-19 23:58 don't count that 2008-09-19 23:59 also, we have much better processes for bug tracking and nailing regressions 2008-09-19 23:59 back in the day it was just linus and some text mode mailer 2008-09-20 00:00 not pine, something like that 2008-09-20 00:00 does the fact linux is x-platform make it harder to write good code, or is that actually a benefit? 2008-09-20 00:00 benefit, but not for the reason you'd think 2008-09-20 00:00 the arch maintainers have autonomy and don't have to listen to linus about what they check in 2008-09-20 00:01 that's a benefit? 2008-09-20 00:01 only thing arch can't touch is core, and there are numerous bypasses 2008-09-20 00:01 it is 2008-09-20 00:01 they can drop some stupidities 2008-09-20 00:01 like they can have kdb in their arch if they want 2008-09-20 00:01 ah 2008-09-20 00:01 doesn't that fragment the kernel though? 2008-09-20 00:02 yes 2008-09-20 00:02 enormous amount of cut and paste sloth across arches 2008-09-20 00:02 if code which is/should be x-platform ends up being arch 2008-09-20 00:02 that's why some interfaces change very slowly 2008-09-20 00:03 somehow I'd assumed arch-code was mostly asm and declarations of macros/support functions for the rest 2008-09-20 00:03 you'd think 2008-09-20 00:03 but now, huge parts of the vm are per-arch 2008-09-20 00:03 things like do_fault 2008-09-20 00:03 which hardly vary 2008-09-20 00:03 but each arch has its own 2008-09-20 00:05 right 2008-09-20 00:06 all arches are forced to follow the x86 page table model by the way, even if their paging does not work that way 2008-09-20 00:07 sometime you need to meet bill irwin 2008-09-20 00:07 far more ascerbic than me 2008-09-20 00:07 kill_litter_super - where does the name come from? 2008-09-20 00:07 litterbug 2008-09-20 00:08 crappiest name ever, almost 2008-09-20 00:09 why? 2008-09-20 00:10 what does a giant cockroach have to do with fs? 2008-09-20 00:10 doesn't say anything? 2008-09-20 00:10 oh 2008-09-20 00:10 litter... leave things lying around 2008-09-20 00:11 oh, maybe, ok 2008-09-20 00:11 as in trash 2008-09-20 00:11 still seems pointless 2008-09-20 00:11 true 2008-09-20 00:14 I think the litter refers to all dentries for the fs have to stay in cache 2008-09-20 00:14 because it has no backing store 2008-09-20 00:14 thus litter memory 2008-09-20 00:14 you'll have to ask viro to know for sure 2008-09-20 00:14 by any measure, one of the worst names ever 2008-09-20 00:15 but shouldn't those no longer be used at the point we unmount? 2008-09-20 00:23 ok, for an unbacked fs like ramfs, the dentries/inodes have to be prevented from disappearing 2008-09-20 00:23 because if they do there is not way to get them back 2008-09-20 00:23 a block backed fs can have them first flushed then evicted 2008-09-20 00:23 so dentry counts can drop to zero 2008-09-20 00:24 unback fs, they have to be forced to zero 2008-09-20 00:24 see d_genocide 2008-09-20 00:24 unbelievably crappy naming 2008-09-20 00:24 and nonexistent documentation 2008-09-20 00:24 take something simple and make it mysterious, it's hacker's job security 2008-09-20 00:28 :-) 2008-09-20 00:30 free_super_unpin would be more informative 2008-09-20 00:30 or 2008-09-20 00:30 unpin_super 2008-09-20 00:33 maze, what's the next move on junkfs? 2008-09-20 00:34 going to be mucking around with the vm or implementing the root directory 2008-09-20 00:34 not sure which, maybe both 2008-09-20 00:35 if you just go with fs/tux3 which has your code in it you get the root directory for free 2008-09-20 00:36 vm doesn't really have much to do with fs 2008-09-20 00:36 right, but I don't learn anything ;-) 2008-09-20 00:36 you just use the interfaces, like cache_alloc etc 2008-09-20 00:36 learn about vma then if you must hurt yourself 2008-09-20 00:36 err 2008-09-20 00:36 dma 2008-09-20 00:37 though vma would not be a bad second chocie 2008-09-20 00:37 choice 2008-09-20 00:37 for hurt 2008-09-20 00:37 heh 2008-09-20 00:37 actually, learning about memory barriers would be useful 2008-09-20 00:37 even to fs work 2008-09-20 00:37 I really want to understand what all the generic implementations do 2008-09-20 00:37 and when and why we would want to not use them 2008-09-20 00:38 sure 2008-09-20 00:38 that's the block IO library 2008-09-20 00:38 try using it as an alternative to what we just did 2008-09-20 00:38 read your superblock via ->readpage 2008-09-20 00:38 hmm? you mean bread and brelse? 2008-09-20 00:38 also doing it sb_bread would be useful 2008-09-20 00:38 easier 2008-09-20 00:38 cruftier 2008-09-20 00:39 create some dentries 2008-09-20 00:39 and inodes 2008-09-20 00:39 link them up properly 2008-09-20 00:39 by hand 2008-09-20 00:39 right 2008-09-20 00:40 well, I'm out all day tomorrow and have work stuff for sunday (yeah, suxorz) so I won't get much done this weekend (end of quarter and all that) 2008-09-20 00:40 read path_walk 2008-09-20 00:40 that will keep you busy 2008-09-20 01:02 first checkin in two days 2008-09-20 01:02 haven't had a two day gap since the start 2008-09-20 01:02 maze's fault for getting me interested in bio hackery 2008-09-20 01:02 lol 2008-09-20 01:02 working on the kernel port I see 2008-09-20 01:02 was 2008-09-20 01:02 maze is going to next 2008-09-20 01:02 how bout you? 2008-09-20 01:02 fun n easy 2008-09-20 01:03 oh still trying to fix out nfs 2008-09-20 01:03 out=our 2008-09-20 01:03 you work 24/7? 2008-09-20 01:03 well, a lot at times 2008-09-20 01:03 isn't child labor illegal? 2008-09-20 01:03 or something like that 2008-09-20 01:03 mostly struggling with various silly road blocks at this time, but I'm happy if it makes some kind of progress 2008-09-20 01:04 send over some of your friends then 2008-09-20 01:04 novell guys can't all be working 16 hours 2008-09-20 01:04 well, I've been watching TV tonight for a good chunk of it 2008-09-20 01:04 but I do work a lot at times 2008-09-20 01:04 need I point out that novell and redhat are both 100% mia for tux3? 2008-09-20 01:04 who is going to win the lamer race? 2008-09-20 01:04 sometimes more focused at various points than others, but I do schedule breaks for myself 2008-09-20 01:05 working on a file system is non-trival 2008-09-20 01:09 most folks in our group are focused on kvm which is a full time deal 2008-09-20 01:09 you know how it works 2008-09-20 01:10 and there's plenty of scheduler work that needs to be done in the general community 2008-09-20 01:10 plenty of design level stuff that needs to be done as well which is also non-trivial 2008-09-20 01:10 hard to find time 2008-09-20 01:11 it tux3 was more complete you might see more effort from various folks at RH or Novell to bug fix, but I think that a lot of folks in the Linux community aren't really interested in solving those probelms since they are happy with ext3 2008-09-20 01:12 but surely somebody in novell 2008-09-20 01:12 can read the code and understand 2008-09-20 01:12 in the entire time I've been there I've yet to talk to a file systems person 2008-09-20 01:12 true 2008-09-20 01:12 at novell 2008-09-20 01:12 can't think of one myself 2008-09-20 01:13 at red hat there's only stephen tweedie, then the gfs2 wan^developers 2008-09-20 01:13 I'm about the closest since I use to work on WAFL which has a completely different set of APIs 2008-09-20 01:13 I should ping sct 2008-09-20 01:13 come to think of it 2008-09-20 01:13 so you folks are on your own for the most part 2008-09-20 01:13 still 2008-09-20 01:13 which is a good and bad thing 2008-09-20 01:13 fs hackers can be made 2008-09-20 01:13 good ;-) 2008-09-20 01:13 it takes a weekend or too 2008-09-20 01:13 two 2008-09-20 01:13 it's a lot harder than that unless you're working on a toy system 2008-09-20 01:13 :) 2008-09-20 01:13 more then that 2008-09-20 01:14 not really 2008-09-20 01:14 hacking ability is more important that fs knowledge 2008-09-20 01:14 logical thinking 2008-09-20 01:14 bug seeing 2008-09-20 01:14 well yeah, but that's true for everything ;-) 2008-09-20 01:14 in hacking,yes 2008-09-20 01:14 well, you know the code really well and have had a lot of experience with file systems, I have some notion of how an enterprise file system should function, but really blank on the implementation details 2008-09-20 01:15 I'd much rather teach a talented python hack to code kernel than an average kernel hack to... do anything 2008-09-20 01:15 like where to go once you've done the b-tree things, atomic logging, etc... 2008-09-20 01:15 I understand 2008-09-20 01:15 it's going to be: extents; atomic commit; kernel 2008-09-20 01:15 with kernel in parallel 2008-09-20 01:15 maze is on it 2008-09-20 01:16 I mean the good thing about this project is the fact that is has some kind of leader, you 2008-09-20 01:16 can't think of a project that doesn't 2008-09-20 01:16 that allows you to guide folks to an implementation, even if I know the fs interfaces and stuff, it wouldn't complete the knowledge base you have with write allocation and stuff 2008-09-20 01:17 you've thought it out carefully and stored that in your head for retrival 2008-09-20 01:17 there's always a part for everybody to work on 2008-09-20 01:17 the fuse thing was sweet 2008-09-20 01:17 flipz: for folks to justify time into tux3, it's got to work on a basic level 2008-09-20 01:18 it does 2008-09-20 01:18 that means in-kernel, atomic commits, snapshots, off-line checker and all of the basic for it to function like a regular file system 2008-09-20 01:18 most of which are still in the future 2008-09-20 01:18 not really 2008-09-20 01:18 working in fuse is working "to a level" 2008-09-20 01:18 don't understand why you'd want an offline checker 2008-09-20 01:19 wouldn't online be better? 2008-09-20 01:19 gh, it doesn't 2008-09-20 01:19 nonsense 2008-09-20 01:19 every filesystem has gotten off the ground before all the stuff was done 2008-09-20 01:19 MaZe: basic checking, online checking is harder 2008-09-20 01:19 zfs still doesn't have a checker 2008-09-20 01:19 yeah, well, zfs is driven by a lot of hyper 2008-09-20 01:19 hype 2008-09-20 01:19 and tons of Sun marketing 2008-09-20 01:20 so there you go 2008-09-20 01:20 zfs is weird 2008-09-20 01:20 MaZe: tell me about it 2008-09-20 01:20 I'd say, yes 2008-09-20 01:20 not that what I'm planning isn't weird... 2008-09-20 01:20 you're planning to reimplment hammer 2008-09-20 01:20 flipz: getting a strong functional implementation is key to getting a lot more developers 2008-09-20 01:20 why not just port hammer? 2008-09-20 01:20 nah 2008-09-20 01:20 bh, I don't want a lot more devs 2008-09-20 01:21 I want a couple more good ones 2008-09-20 01:21 redhat hacks can join the party late, what's new 2008-09-20 01:21 flipz: you have a lot of technical problems still, I can make some suggestions if I see something useful that I can add 2008-09-20 01:21 it's true that what you probably want is a half dozen good devs you can put in one room and have them bounce off of each other 2008-09-20 01:21 cyber room 2008-09-20 01:22 flipz: all of which is a good thing in that it still needs to be solved 2008-09-20 01:22 cyber is good, but real is better 2008-09-20 01:22 tons of intersting work still ahead and discoveries 2008-09-20 01:22 maze, halloween cabal party 2008-09-20 01:22 two locations in venice 2008-09-20 01:22 primary and overflow 2008-09-20 01:22 venice in europe? 2008-09-20 01:22 venice beach 2008-09-20 01:22 in socal 2008-09-20 01:23 oh, in santa monica 2008-09-20 01:23 a little south 2008-09-20 01:23 flipz: unfortunately, I can't help you get it off the ground, I might of help in the future 2008-09-20 01:23 skating distance 2008-09-20 01:23 btw venice, ca on gmaps finds venice, ab, ca 2008-09-20 01:23 hah 2008-09-20 01:23 fix it 2008-09-20 01:23 regarding things like concurrency as you run into scalability problems, etc... 2008-09-20 01:23 354 minles 2008-09-20 01:23 miles 2008-09-20 01:24 just book a trip 2008-09-20 01:24 nah, I'm cheap 2008-09-20 01:24 you don't have to justify, it's a short hop 2008-09-20 01:24 I mean on goog's nickle 2008-09-20 01:24 won't fly 2008-09-20 01:25 santa monica peeps lose touch unless mtv peeps show up from time to time 2008-09-20 01:25 it's mercy travel 2008-09-20 01:25 lol 2008-09-20 01:25 well 2008-09-20 01:25 try this 2008-09-20 01:25 gotta take it private 2008-09-20 01:25 yeah, well I find it hard to even visit a datacenter and that's actually work related 2008-09-20 01:26 flipz: LA is wierd 2008-09-20 01:27 seems ok to me 2008-09-20 01:27 besides I like driving, and don't like flying... 2008-09-20 01:28 LA is weird, but so is SF 2008-09-20 01:28 SF is much better, it's much more diverse 2008-09-20 01:28 you can yourself into and out of trouble in SF 2008-09-20 01:29 I know most of the trouble spots 2008-09-20 01:29 plenty of experience with that 2008-09-20 01:30 only thing is 2008-09-20 01:30 the halloween cabal party is in venice, not sfo 2008-09-20 01:31 anyway it's warmer here 2008-09-20 01:31 better rollerskating 2008-09-20 01:31 more sand 2008-09-20 01:31 you know 2008-09-20 01:31 heh 2008-09-20 01:31 yeah, it's getting cool up north 2008-09-20 01:32 chicks wear less clothing and are on more hippy drugs 2008-09-20 01:32 flipz: a high performance parallelized implementation will be interesting because the Linux kernel itself isn't very scalable in the core fs core 2008-09-20 01:32 code 2008-09-20 01:32 bh, you said it 2008-09-20 01:32 namely posix file locking 2008-09-20 01:33 ugh, posix, 2008-09-20 01:33 I can help you there 2008-09-20 01:33 ugh, posix 2008-09-20 01:33 can't live without it 2008-09-20 01:33 no shit 2008-09-20 01:33 nobody uses it really 2008-09-20 01:33 can't live with it 2008-09-20 01:33 I never saw anybody use it 2008-09-20 01:33 ah 2008-09-20 01:34 so you have some ideas on range locks? 2008-09-20 01:34 yes 2008-09-20 01:34 want to expound? 2008-09-20 01:34 tree? 2008-09-20 01:34 but of course 2008-09-20 01:34 or something smarter? 2008-09-20 01:34 now what kind... 2008-09-20 01:34 maybe it should be done on a per inode basis 2008-09-20 01:34 smarter than a tree? dwimlocks 2008-09-20 01:34 it's in memory 2008-09-20 01:34 so something optimized for in-mem 2008-09-20 01:35 so probably not b-tree 2008-09-20 01:35 use an hball structure 2008-09-20 01:35 a magic 8ball that would be coo 2008-09-20 01:35 l 2008-09-20 01:35 ;-) 2008-09-20 01:35 yah 2008-09-20 01:35 geodesic pointers 2008-09-20 01:36 I think CLR has a red-black tree generalize to interval tree exercise 2008-09-20 01:36 If I remember correctly the generlizement works correctly ... 2008-09-20 01:36 blah 2008-09-20 01:36 no need 2008-09-20 01:36 actually 2008-09-20 01:36 flipz: one of the only places that I've seen lock contention activity is one of the inode locks 2008-09-20 01:36 no need 2008-09-20 01:36 no you do need it 2008-09-20 01:36 bh, which one? 2008-09-20 01:36 I think it's lock during directory traversal or something like that 2008-09-20 01:36 since you can have multi-read-locks, but only one write -lock 2008-09-20 01:37 sure it's not rename? 2008-09-20 01:37 I looked at the problem at there was many places in ext3 that uses this lock in a generic fashion, 2008-09-20 01:37 atomic rename 2008-09-20 01:37 flipz: can't remember 2008-09-20 01:37 the toughest thing to get right, that you absolutely must have 2008-09-20 01:37 it showed up in a "find" load so it's easy to reproduce 2008-09-20 01:37 I implemented the first revision of lockstat for this purpose 2008-09-20 01:38 low impact contention measurements in -rt 2008-09-20 01:38 peterz reimplemented this in lockdep 2008-09-20 01:38 I wonder if an fs could be done with something rcu like 2008-09-20 01:38 so this should definitely show up in trivially reproducable runs 2008-09-20 01:38 bh, let us know when you reproduce 2008-09-20 01:38 MaZe: fine grained locking maybe 2008-09-20 01:39 or per cpu-ification 2008-09-20 01:39 I use to have the runs for it 2008-09-20 01:39 maze, rcu is certainly applicable to fs 2008-09-20 01:39 maze, but its a walk/run situation 2008-09-20 01:39 that won't be easy to get right 2008-09-20 01:39 flipz: just compile in lockdep with stats tracking and cat /proc/lock_stats 2008-09-20 01:39 bh, just tell us ;) 2008-09-20 01:40 we'll do lockdep etc when we get there 2008-09-20 01:40 I have to compile a custom kernel or something like that to figure that out 2008-09-20 01:40 flipz: it'll still be there when you get it into the kernel :) 2008-09-20 01:40 no rush 2008-09-20 01:40 locking is really not the major issue in fs, seek is, and if you mess up, indexing 2008-09-20 01:40 and block allocation 2008-09-20 01:41 that is major 2008-09-20 01:41 blows away locking in actual impact 2008-09-20 01:41 well, what if you have a high performance IO situation ? wouldn't that eventually push things ? 2008-09-20 01:41 probably would need many cpus for it to push 2008-09-20 01:41 if your allocation/seeking suck, which they probably do, you don't care 2008-09-20 01:41 and some sort of wicked raid array to run on 2008-09-20 01:42 what if you have contention against, say, atomic logging 2008-09-20 01:42 fix it when you get there 2008-09-20 01:42 MP atomic logging 2008-09-20 01:42 could be very tricky 2008-09-20 01:42 it's not the logging which is tricky 2008-09-20 01:42 probably isn't 2008-09-20 01:42 it's the mutates 2008-09-20 01:42 just use bio for everything 2008-09-20 01:43 but tux3 isn't there yet so we can't talk about the issues 2008-09-20 01:43 contention will never be in logging 2008-09-20 01:43 inherently async, inherently multi cpu 2008-09-20 01:43 it'll always be on metadata in-mem updates 2008-09-20 01:43 mostly at the dir and up level 2008-09-20 01:43 maze, interesting proposition 2008-09-20 01:43 well, that's still subject to atomic logging right ? 2008-09-20 01:43 yeah, but atomic logging is almost a no-op 2008-09-20 01:43 what if you have a very heavy metadata load ? 2008-09-20 01:44 it's short and sweet and very easy to shard 2008-09-20 01:44 I'd tend to agree 2008-09-20 01:44 the contention is in metadata updates, possibly in block allocation 2008-09-20 01:44 possilby 2008-09-20 01:44 not in the logging, since logging is pretty much just adding a new element to a queue 2008-09-20 01:44 these kind of systems could put a lot of pressure on a single tree data structure, etc... maybe you need to lock for online checking, etc... it's hard to say until you have a fairly complete basic implementation 2008-09-20 01:45 allowing parallel access to the allocation bitmap will be fun 2008-09-20 01:45 right 2008-09-20 01:45 sounds like a pain in the ass kind of problem :) 2008-09-20 01:45 bh, the nice thing about trees is they have subtrees 2008-09-20 01:45 it might be you may end up using a 'usually-works' type of algo 2008-09-20 01:45 flipz: yeah, but the allocation bitmap is a bitch 2008-09-20 01:45 with fallback on collision detection to locking 2008-09-20 01:45 bh, why? 2008-09-20 01:46 the bitch is finding the area to allocate 2008-09-20 01:46 it'll have to be protect in cases where you have heavy write operations, right ? 2008-09-20 01:46 one word: range lock 2008-09-20 01:46 which is inherently a read only op 2008-09-20 01:46 maze, right 2008-09-20 01:46 and the writes are again - quick 2008-09-20 01:46 well, those traversals can be long right ? 2008-09-20 01:46 yes, against cache 2008-09-20 01:46 so basically something rcu-like works 2008-09-20 01:46 spinlock zone 2008-09-20 01:47 although writing the algo with per-cpu 2008-09-20 01:47 when the long traversals get to be the problem, we're winning 2008-09-20 01:47 something like using a hash function with a cpunum parameter 2008-09-20 01:47 then we need a second order map to say where the high/low density areas are 2008-09-20 01:47 shorten the traversal, right 2008-09-20 01:47 rcu might work really well with the bitmap 2008-09-20 01:48 theoretically you could switch allocation strategies based on load and disk fullness 2008-09-20 01:48 simply because tux3 has the notion of logging allocations rather than directoy entering them 2008-09-20 01:48 well 2008-09-20 01:48 hmm 2008-09-20 01:48 hah 2008-09-20 01:48 it also has the notion of keeping the cache blocks up to date 2008-09-20 01:48 right, you'd almost need rcu 2008-09-20 01:48 which conflicts with rcu 2008-09-20 01:48 um 2008-09-20 01:48 it doesn't have to be precise fortunately 2008-09-20 01:48 but 2008-09-20 01:48 don't need rcu 2008-09-20 01:48 why conflicts? 2008-09-20 01:49 can do as well as rcu without it 2008-09-20 01:49 rcu doesn't like changing things 2008-09-20 01:49 that's very slow 2008-09-20 01:49 I'm not even sure a bitmap is the right way to go 2008-09-20 01:49 true 2008-09-20 01:49 you may want something sparser 2008-09-20 01:49 read my writing on that? 2008-09-20 01:49 nope 2008-09-20 01:49 there's a post 2008-09-20 01:49 analyzed in detail 2008-09-20 01:49 thinking of something tree with bitmap leaf like 2008-09-20 01:49 bitmap has 25/2 advantage or something like that in some cases 2008-09-20 01:49 extent tree has just as much or more in others 2008-09-20 01:50 braindead simple solution 2008-09-20 01:50 indeed? 2008-09-20 01:50 use both? 2008-09-20 01:50 in regions where bitmap is better, use a bitmap, otherwise extents 2008-09-20 01:50 in regions where it's a tie you don't care which 2008-09-20 01:50 uh, so tree with bitmap leafs ;-) 2008-09-20 01:50 use whichever is already there 2008-09-20 01:50 simpler 2008-09-20 01:50 sparse bitmap? 2008-09-20 01:50 bitmap stays like it is 2008-09-20 01:51 extent tree can have bitmap regions as leaves instead of extent block 2008-09-20 01:51 just a logical offset in the bitmap 2008-09-20 01:51 exactly what I was thinking 2008-09-20 01:51 instead of a pointer to a leaf block 2008-09-20 01:51 good 2008-09-20 01:51 so you have a tree which can have leaves - either describing the state, or pointing to the right fragment of a sparse bitmap 2008-09-20 01:52 yes 2008-09-20 01:52 use some left over bits in the extent tree for accelleration 2008-09-20 01:52 and you probably allocate space for it just like for any normal file 2008-09-20 01:52 of course 2008-09-20 01:52 in fact both are mapped into normal fiels 2008-09-20 01:52 possibly need to have a few blocks of reserve space to prevent weird cases 2008-09-20 01:53 I showed that's better than direct block pointers 2008-09-20 01:53 there is a weird case 2008-09-20 01:53 very weird 2008-09-20 01:53 when while allocating space the tree and bitmap are full and you need to split and allocate more blocks, etc 2008-09-20 01:53 the bitmap is sparse... so you go set a bit in it, that allocs a block and marks a bit in another block... which might be sparse... 2008-09-20 01:53 yeah, obviously need to either be very careful or do this right 2008-09-20 01:53 "terminate me this" 2008-09-20 01:54 do this right 2008-09-20 01:54 be aware of it 2008-09-20 01:54 think about it clearly 2008-09-20 01:54 the advantage of spare allocation of the bitmap is compelling imho 2008-09-20 01:54 it should terminate if you lean the algos to choosing to non-split blocks 2008-09-20 01:54 I also posted about that 2008-09-20 01:54 same thing for the logging 2008-09-20 01:54 algorithm proposals welcome 2008-09-20 01:54 should also keep the tree from degenerating to a non-sparse bitmap 2008-09-20 01:55 shapor was going to scrape the ml for those posts 2008-09-20 01:55 well 2008-09-20 01:55 it's more important to choose optimal locations 2008-09-20 01:55 I've barely read any of the ml posts... just not enough time 2008-09-20 01:55 than worry about how the bitmap is split 2008-09-20 01:55 ah 2008-09-20 01:55 well let me find one 2008-09-20 01:55 yes ... but no 2008-09-20 01:55 about the bitmaps vs btree 2008-09-20 01:56 8910 flips 2008-09-20 01:56 2885 MaZe 2008-09-20 01:56 1696 shapor 2008-09-20 01:56 1003 bh 2008-09-20 01:56 876 flipz 2008-09-20 01:56 671 konrad 2008-09-20 01:56 418 tim_dimm 2008-09-20 01:56 196 RazvanM 2008-09-20 01:56 133 Bushman 2008-09-20 01:56 latest stats 2008-09-20 01:56 your closing 2008-09-20 01:56 of late it seems flipz has abandoned coding in favour of frivilous chat 2008-09-20 01:57 ;-) 2008-09-20 01:57 or he just likes to talk about interestig problems 2008-09-20 01:58 allocation is a good one 2008-09-20 01:58 oh, you're still here? so what was your idea ;-)? 2008-09-20 01:59 you seemed to disappear right when you were getting to the good part 2008-09-20 01:59 flipz: actually pulled an old trick out of his hat and came up wit a good solution 2008-09-20 01:59 maze, "More about the free tree" 2008-09-20 02:00 http://kerneltrap.org/mailarchive/tux3/2008/8/16/2959334 2008-09-20 02:00 not the one I was thinking of 2008-09-20 02:01 reading 2008-09-20 02:01 "All about the free tree" 2008-09-20 02:01 http://kerneltrap.org/mailarchive/tux3/2008/8/13/2929244 2008-09-20 02:01 right previous one 2008-09-20 02:01 whoever said the fs has to be clean on unmount? 2008-09-20 02:02 just rely on the journal recovery getting memory state consistent 2008-09-20 02:02 problem solved 2008-09-20 02:02 right 2008-09-20 02:02 relating to recursion 2008-09-20 02:02 continuing to parse 2008-09-20 02:02 my point from the first tux3 post 2008-09-20 02:02 breaking new ground it seems 2008-09-20 02:03 hmm? 2008-09-20 02:03 nobody else does it that way 2008-09-20 02:03 oh, yeah, scarredy cats 2008-09-20 02:03 hmm, that's mis-spelt (as is this) I think 2008-09-20 02:03 there are scaredy cats an scarred cats ;) 2008-09-20 02:04 you just have to log all (de-)allocates etc 2008-09-20 02:04 yeah it's a little complex but very satisfying 2008-09-20 02:04 nice we have some kickass bio primitives to log with huh? 2008-09-20 02:05 those ones I did earlier today are really quite nice and general 2008-09-20 02:05 you start the fs with just the journal 2008-09-20 02:05 only thing you might want is alloc flags... but then maybe not even 2008-09-20 02:05 and empty trees and structures 2008-09-20 02:05 no journal 2008-09-20 02:05 then you dump the allocation for the superblock and log into the journal 2008-09-20 02:05 journals are for lamerz -- flipz 2008-09-20 02:06 and you let the kernel module (journal -> forward log) deal with creating the fs 2008-09-20 02:06 no complexity in the mkfs code at all 2008-09-20 02:06 tux3 already creates the fs extremely elegantly 2008-09-20 02:06 hard to improve on, really 2008-09-20 02:06 still parsing ;-) 2008-09-20 02:06 Creating a new Tux3 filesystem requires allocating a number of objects, 2008-09-20 02:06 including objects involved in allocation.  Another nice recursion 2008-09-20 02:06 there: you have to allocate space for objects, but the block allocator 2008-09-20 02:06 is not initialized yet. <- where I am 2008-09-20 02:07 ah 2008-09-20 02:07 right that was fun 2008-09-20 02:07 a few segfaults on the way to sorting it 2008-09-20 02:08 do you mark the superblock as a used block in the alloc tree? 2008-09-20 02:08 superblock(s) 2008-09-20 02:08 yes 2008-09-20 02:08 ok 2008-09-20 02:08 ah, you chickened out at the allocation strategy ;-) 2008-09-20 02:09 that's always where my reasoning breaks down... 2008-09-20 02:09 line 480 "reserve superblock" http://tux3.org/tux3?f=bcfdc76d14a8;file=user/test/inode.c 2008-09-20 02:09 didn't chicken out 2008-09-20 02:09 recognized how big a post it would be and deferred it 2008-09-20 02:09 but there are writings 2008-09-20 02:09 just one? 2008-09-20 02:09 the essential point 2008-09-20 02:09 a farily long one 2008-09-20 02:10 ends with "generating functions" 2008-09-20 02:10 I've been thinking an fs should have to superblocks 2008-09-20 02:10 haven't written that one yet 2008-09-20 02:10 so called bracket-blocks 2008-09-20 02:10 one in front, one at the end 2008-09-20 02:10 you can then support extending a file-system forward and backward 2008-09-20 02:10 http://kerneltrap.org/mailarchive/tux3/2008/8/27/3094404 [Tux3] Spacial correlation between directory entries, inodes and file data 2008-09-20 02:11 if the front or back of the bdev remains unchanged between mounts you can fix up and resize the filesystem on the fly 2008-09-20 02:11 should make resizing easier 2008-09-20 02:11 why do you want to extend forwards? 2008-09-20 02:11 or I'm not sure, backwards? 2008-09-20 02:12 interesting ain't it? 2008-09-20 02:12 not sure 2008-09-20 02:12 because normal fs extend -> 2008-09-20 02:12 so it's natural to want to be able to extend <- 2008-09-20 02:12 to be able to share space between them 2008-09-20 02:12 without having to have lvm in the middle 2008-09-20 02:12 did god intend that? 2008-09-20 02:13 why not have lvm in the middle? 2008-09-20 02:13 an extra layer 2008-09-20 02:13 not really 2008-09-20 02:13 with little apparent benefit 2008-09-20 02:13 provisioning 2008-09-20 02:13 plus it then ends up remapping disk order 2008-09-20 02:13 big benefit 2008-09-20 02:13 so? 2008-09-20 02:13 which breaks space optimizations you might otherwise attempt 2008-09-20 02:13 remap in big chunks 2008-09-20 02:13 seeks across an lvm 2008-09-20 02:13 that fixes that 2008-09-20 02:13 are no longer like seeks across a normal disk 2008-09-20 02:14 no, they're faster 2008-09-20 02:14 across an array 2008-09-20 02:14 yes, true 2008-09-20 02:14 but I still think you want to be able to keep the order of blocks in the fs the same as on disk - even if it gets split across multiple disks and raid5/6ed 2008-09-20 02:15 the cost of lvm is much less than you think, see my bio stacking patches 2008-09-20 02:15 argh, bh disappeared again 2008-09-20 02:15 I'm not thinking of cpu kernel nor stack cost 2008-09-20 02:15 youcan permute in big chunks without loss of performance 2008-09-20 02:15 I'm thinking of cost of the disk no longer being linear 2008-09-20 02:15 that's key 2008-09-20 02:15 that's one of the few things we have going for us 2008-09-20 02:16 but you can't have fluid space allocation between multiple fs 2008-09-20 02:16 you do 2008-09-20 02:16 although I'm not sure having that would be a good thing ;-) 2008-09-20 02:16 just at a coarser granularity than you're thinking 2008-09-20 02:17 I am thinking, the provision granularity will be units of 128 MB 2008-09-20 02:17 oh, I know another reason why I liked this 2008-09-20 02:17 because of dr 2008-09-20 02:17 coincidentally, the number of 4K blocks you can map with one 4K block 2008-09-20 02:17 recovering the fs from a partially damaged disk 2008-09-20 02:18 getting late 2008-09-20 02:18 you basically want to survive in some pseudo decent state a situation in which multi-megabyte pieces of the disk go awol 2008-09-20 02:18 and I only checked in half the code I'd planeed 2008-09-20 02:18 indeed 2008-09-20 02:18 been meaning to say: 2008-09-20 02:18 yes 2008-09-20 02:18 good rule to write into the plan 2008-09-20 02:18 it shall be so 2008-09-20 02:18 signing off and rebooting into mac to finally install spore 2008-09-20 02:18 but have gotten caught up in conversations ;-) 2008-09-20 02:19 heh 2008-09-20 02:19 maybe I have to try that lucasarts demo again 2008-09-20 02:19 have to put on a show for my daughter tomorrow 2008-09-20 02:19 get back from hols 2008-09-20 02:19 I believe it should be possible to write the kernel fs 2008-09-20 02:19 code in such a way that no user space utilities would be required 2008-09-20 02:19 there is no reason why the kernel can't repair the fs 2008-09-20 02:20 oh yeah, especially considering I already did it 2008-09-20 02:20 since the kernel code has to be fault tolerant anyway 2008-09-20 02:20 well there's no fsck 2008-09-20 02:20 but there is mkfs 2008-09-20 02:20 can easily be done on mount 2008-09-20 02:20 -!- pgquiles(~pgquiles@229.Red-83-49-101.dynamicIP.rima-tde.net) has joined #tux3 2008-09-20 02:20 even that could be a couple function in kernel with a mount mkfs option 2008-09-20 02:20 hey pgquiiles ;) 2008-09-20 02:20 since most of the mkfs ends up being inkernel anyways 2008-09-20 02:20 just when we were heading to our consoles 2008-09-20 02:21 there's going to be -omake for tux3 2008-09-20 02:21 huh? 2008-09-20 02:21 oh 2008-09-20 02:21 as in -omkfs? 2008-09-20 02:21 mount -ttux3 -omake /dev/foo /mnt/rulez 2008-09-20 02:21 yep 2008-09-20 02:21 mount -ttux3 -omkfs /dev/foo /mnt/rulez 2008-09-20 02:21 better I guess 2008-09-20 02:22 trivial 2008-09-20 02:22 exactly 2008-09-20 02:22 useful 2008-09-20 02:22 I didn't realize I always wanted that tillyou mentioned it 2008-09-20 02:22 ok... options parsing 2008-09-20 02:22 yeah, and it'll probably share a fair bit of code with the recovery parts of the kernel 2008-09-20 02:22 my ask for next in junkfs 2008-09-20 02:22 ok? 2008-09-20 02:22 ...maybe... 2008-09-20 02:22 ok 2008-09-20 02:23 text strings of course 2008-09-20 02:23 I want to see waht kind of parser you write ;) 2008-09-20 02:23 lol 2008-09-20 02:23 something tells me you're a parser kind of guy 2008-09-20 02:23 the parser doesn't have to be efficient 2008-09-20 02:23 shift reduce is second nature 2008-09-20 02:23 hmm? 2008-09-20 02:23 since it'll be used barely ever 2008-09-20 02:23 hmm? 2008-09-20 02:23 sure 2008-09-20 02:23 shift reduce? 2008-09-20 02:24 lalr-1? 2008-09-20 02:24 context sensitive? 2008-09-20 02:24 well 2008-09-20 02:24 ok, you're going to have fun learning that 2008-09-20 02:24 should take you about a day 2008-09-20 02:24 I think I'm not aware of the terms you're using 2008-09-20 02:25 at least not in English 2008-09-20 02:25 basic parser lingo 2008-09-20 02:25 you're in for a formative experience 2008-09-20 02:25 that's a whole class of algorithmic thoughts you haven't had yet 2008-09-20 02:25 what kind of options do we need to parse anyway? 2008-09-20 02:25 opt=# 2008-09-20 02:25 opt=[on/off] 2008-09-20 02:26 on=[enum] 2008-09-20 02:26 erm opt=[enum] 2008-09-20 02:26 opt=[ip4|ip6|ip4:port...hostname,etc.] 2008-09-20 02:26 -omkfs 2008-09-20 02:26 mostly on/off present/not present and integers it seems 2008-09-20 02:27 case sensitivity? 2008-09-20 02:27 gnu opt syntax would be nice 2008-09-20 02:27 but 2008-09-20 02:27 you don't get to parse the command line 2008-09-20 02:27 just the -o part 2008-09-20 02:27 right 2008-09-20 02:27 it has a set syntax 2008-09-20 02:27 but you do end up getting a single char* 2008-09-20 02:27 every options parser I've seen is sickening 2008-09-20 02:28 what part is sickening? 2008-09-20 02:28 the implementation 2008-09-20 02:28 I meant which part ... what don't you like? 2008-09-20 02:28 ext2/3 used to be even worse than they are 2008-09-20 02:28 cut and paste 2008-09-20 02:28 poor use of tables 2008-09-20 02:28 have to look in 3 places to see what's going on 2008-09-20 02:28 that kind of thing 2008-09-20 02:28 long code 2008-09-20 02:28 mostly fluff 2008-09-20 02:29 hmm 2008-09-20 02:30 { 2008-09-20 02:30 { "uid32", &flag, OP_OR, FLAG_UID32 } 2008-09-20 02:30 that's from? 2008-09-20 02:30 head 2008-09-20 02:30 right 2008-09-20 02:30 good 2008-09-20 02:31 less use of enums is better 2008-09-20 02:31 but hard to avoid entirely 2008-09-20 02:31 but yes, that's the way it should be 2008-09-20 02:31 I'm thinking a parse_options(char *, options_table_t *) 2008-09-20 02:31 and please no destroying the input string ;) 2008-09-20 02:32 sure, most should be handled by directly setting a flag, no special option code to invoke 2008-09-20 02:32 possibly multiple flags 2008-09-20 02:32 and setting clearing 2008-09-20 02:33 with appropriate callback, sure 2008-09-20 02:33 right callbacks 2008-09-20 02:33 also strings can be handled via store length and store pointer to non-asciiz 2008-09-20 02:33 that way we don't modify 2008-09-20 02:33 bonus points for being able to parse binary 2008-09-20 02:33 I _always_ store length 2008-09-20 02:34 right being able to parse 0x000 and 000b and 007 and 23#blah 2008-09-20 02:34 never asciiz except for throwaway 2008-09-20 02:34 although... 2008-09-20 02:34 converting commas to nulls in the input string, might be acceptable... 2008-09-20 02:34 there isn't a unicode requirement 2008-09-20 02:34 fortunately 2008-09-20 02:34 utg8 2008-09-20 02:34 utf8 2008-09-20 02:34 not even 2008-09-20 02:34 should be utf8 clean 2008-09-20 02:35 though 2008-09-20 02:35 command lines are not utf8 2008-09-20 02:35 although that's obvious 2008-09-20 02:35 correct me if I'm wrong 2008-09-20 02:35 there is unfortunately no standard 2008-09-20 02:35 but utf8 is winning 2008-09-20 02:35 and the fs is utf8 2008-09-20 02:35 so they basically are 2008-09-20 02:35 utf8 options... 2008-09-20 02:35 probably not necssary 2008-09-20 02:36 probably also no hurt to support 2008-09-20 02:36 might not even need any code if done right 2008-09-20 02:36 I'm geeking the question 2008-09-20 02:37 hmm? 2008-09-20 02:37 what does that mean 2008-09-20 02:37 geeking? 2008-09-20 02:37 almost like googling 2008-09-20 02:37 well geeking the question 2008-09-20 02:37 oh 2008-09-20 02:37 except geekier 2008-09-20 02:38 so in case of conflicting options 2008-09-20 02:38 last one on the right wins 2008-09-20 02:38 right? 2008-09-20 02:38 or all get computed and accumulated 2008-09-20 02:40 http://developer.apple.com/technotes/tn2002/tn2065.html 2008-09-20 02:40 Q: What does do shell script do with non-ASCII text (accented characters, Japanese, etc.)? 2008-09-20 02:40 useful commentary 2008-09-20 02:40 exit with an error 2008-09-20 02:40 on any option conflict 2008-09-20 02:41 informative error 2008-09-20 02:41 properly formatted 2008-09-20 02:41 saying what conflicted with what 2008-09-20 02:41 give up on first conflict 2008-09-20 02:42 that might be hard to do 2008-09-20 02:42 since what conflicts with what is a problem in and of itsel 2008-09-20 02:42 f 2008-09-20 02:42 if it's not hard you wont' get pheromones from it 2008-09-20 02:44 right 2008-09-20 02:44 the comments above from apple are not quire 2008-09-20 02:44 quite 2008-09-20 02:44 most linux international os'es 2008-09-20 02:44 have LC_ALL=something.utf-8 2008-09-20 02:44 $ locale 2008-09-20 02:44 LANG=en_US.UTF-8 2008-09-20 02:44 LC_CTYPE="en_US.UTF-8" 2008-09-20 02:44 LC_NUMERIC="en_US.UTF-8" 2008-09-20 02:44 LC_TIME="en_US.UTF-8" 2008-09-20 02:44 LC_COLLATE="en_US.UTF-8" 2008-09-20 02:44 LC_MONETARY="en_US.UTF-8" 2008-09-20 02:44 LC_MESSAGES="en_US.UTF-8" 2008-09-20 02:44 LC_PAPER="en_US.UTF-8" 2008-09-20 02:44 LC_NAME="en_US.UTF-8" 2008-09-20 02:44 LC_ADDRESS="en_US.UTF-8" 2008-09-20 02:44 LC_TELEPHONE="en_US.UTF-8" 2008-09-20 02:44 LC_MEASUREMENT="en_US.UTF-8" 2008-09-20 02:44 LC_IDENTIFICATION="en_US.UTF-8" 2008-09-20 02:44 LC_ALL= 2008-09-20 02:45 those that don't are broken ;-) and why your xchat didn't work 2008-09-20 02:45 and since multi-byte encodings (like unicode16 or nicode32) don't work in shell 2008-09-20 02:45 mount options themselves have no unicode impact 2008-09-20 02:45 precisely 2008-09-20 02:45 only possibility is values supplied to options 2008-09-20 02:45 so everything is ascii 2008-09-20 02:45 and I don't know of any of those 2008-09-20 02:46 theoreticaly a mount option could be the directory relative to root fs to mount as root 2008-09-20 02:46 and that might be utf8 2008-09-20 02:46 is there one of those? 2008-09-20 02:46 but really at that point it's just a string of bytes 2008-09-20 02:46 there should be ;-) 2008-09-20 02:47 passphrase 2008-09-20 02:47 that too 2008-09-20 02:47 remote machines hostname 2008-09-20 02:47 when using international domains 2008-09-20 02:47 passphrase in cleartext on the mount command would be really lame ;) 2008-09-20 02:47 let's not do that 2008-09-20 02:47 well 2008-09-20 02:47 anyway, it's not so much a matter of support, but non-breaking it 2008-09-20 02:47 I suppose it's ok if nobody can see it (secret fstab) 2008-09-20 02:48 nah, not the right spot for passwords 2008-09-20 02:48 and tux3 shouldn't do crypto anyway, not really 2008-09-20 02:49 that's something you should definitely rely on dm-crypt or whatever to do 2008-09-20 02:49 it may end up doing some namespace stuff 2008-09-20 02:49 that only the filesystem can do 2008-09-20 02:49 oh, had an idea 2008-09-20 02:49 for pipe files 2008-09-20 02:50 data is a pipe, with appropriate flags, do cause reading/writing it to launch a zcat/gzip to data.gz 2008-09-20 02:50 userspace compression ;-) 2008-09-20 02:50 the pipe ends up being not seekable though 2008-09-20 02:50 ah, overload the file semantics 2008-09-20 02:50 but maybe it'd be of some use 2008-09-20 02:50 so you can plug a filter in front of a file 2008-09-20 02:50 right 2008-09-20 02:51 and remember that on the fs 2008-09-20 02:51 exactly 2008-09-20 02:51 should be fun to write up 2008-09-20 02:51 always wanted that 2008-09-20 02:51 but I didn't think it should be within the fs 2008-09-20 02:51 more a vfs thing 2008-09-20 02:51 like a per-file mount 2008-09-20 02:51 but then how do you make it persistent? 2008-09-20 02:51 that's why the fs has to support storing it 2008-09-20 02:52 that's were the must-be-in-fs might come in 2008-09-20 02:52 and where once again xattr options could come in useful 2008-09-20 02:52 xattr has the nice benefit that archival utilities already support storing them 2008-09-20 02:52 trick is to come if with a mechanism-not-policy on that 2008-09-20 02:52 and have it be a nice mechanism and not totally single purpsoe 2008-09-20 02:53 yeah, the above is just a rough concept 2008-09-20 02:53 wow I let this vino stand on end too long 2008-09-20 02:53 corks resisting 2008-09-20 02:53 theoretically the command to run on read/write could be a xattr option 2008-09-20 02:53 maybe fifo better than pipe 2008-09-20 02:54 unsure 2008-09-20 02:54 maybe a fifo that becomes a pipe on read or write access 2008-09-20 02:54 that sounds better 2008-09-20 02:54 since I don't think unix support pipe on fs 2008-09-20 02:54 maybe there's a new syscall that associates 2008-09-20 02:54 unix does support pipe on fs 2008-09-20 02:54 where have you been? 2008-09-20 02:55 oh it does? 2008-09-20 02:55 use it heavily in zumastor 2008-09-20 02:55 almost never use pipes and fifos (except pipes in shell via | ) 2008-09-20 02:55 named pipes 2008-09-20 02:55 bash daemons ;) 2008-09-20 02:56 wow, that was tight 2008-09-20 02:56 probably the semantics are wrong though... or can you open the same pipe multiple times for write and /or read and not have conflicts? 2008-09-20 02:56 the cork? 2008-09-20 02:56 it's 3am 2008-09-20 02:56 bash semantics for pipes are slightly odd 2008-09-20 02:56 since it can't hold them open 2008-09-20 02:57 there also needs to be a way to store metadata associated with a file with information on when to invalidate it 2008-09-20 02:57 it relies on the fringe behavior 2008-09-20 02:57 what happens before the other side opens and after it closes 2008-09-20 02:57 or some very powerful way to keep track of file state 2008-09-20 02:57 well 2008-09-20 02:57 hold that thought ;) 2008-09-20 02:57 bash can keep em open 2008-09-20 02:57 till after the junkfs option parser 2008-09-20 02:57 use exec to redirect 2008-09-20 02:58 right, option parser first ;-) 2008-09-20 02:58 oh, sick 2008-09-20 02:58 I need to improve my demented index 2008-09-20 02:58 huh? 2008-09-20 02:58 redirecting a pipe in bash 2008-09-20 02:58 to keep it open 2008-09-20 02:58 sick 2008-09-20 02:58 huh 2008-09-20 02:58 why? 2008-09-20 02:59 it's how I do tcp connections in bash 2008-09-20 02:59 bash can keep em open <- just commenting 2008-09-20 02:59 well for our app that would be way sick 2008-09-20 03:00 why? 2008-09-20 03:00 here let me find some code 2008-09-20 03:01 I know what you're talking about 2008-09-20 03:01 you don't know how we use pipes 2008-09-20 03:01 CR=`echo -en "\r"` 2008-09-20 03:01 open3() { 2008-09-20 03:01 exec 3<>/dev/tcp/$HOSTNAME/$HOSTPORT 2008-09-20 03:01 } 2008-09-20 03:01 if you did you wouldn't have to ask about the sick 2008-09-20 03:01 close3() { 2008-09-20 03:01 exec 3<&- 2008-09-20 03:01 } 2008-09-20 03:01 get() { 2008-09-20 03:01 open3 || { sleep 1; exit; } 2008-09-20 03:01 echo -en "GET $1 HTTP/1.0\r\n\r\n" >&3 2008-09-20 03:01 cat <&3 2008-09-20 03:01 close3 2008-09-20 03:01 } 2008-09-20 03:01 getfile() { 2008-09-20 03:01 get "$1" | while read line; do 2008-09-20 03:01 if [ "$line" == "" -o "$line" == "$CR" ]; then cat; exit; fi 2008-09-20 03:01 done 2008-09-20 03:01 } 2008-09-20 03:01 there 2008-09-20 03:01 wget 2008-09-20 03:01 heh 2008-09-20 03:01 ok 2008-09-20 03:01 that is sick 2008-09-20 03:01 really 2008-09-20 03:01 shapor needs to see it 2008-09-20 03:01 works wonders 2008-09-20 03:02 putfile() { 2008-09-20 03:02 open3 2008-09-20 03:02 cat > $TMPDIR/post-$$ 2008-09-20 03:02 LEN=`wc -c < $TMPDIR/post-$$` 2008-09-20 03:02 echo -en "POST $1 HTTP/1.0\r\n" >&3 2008-09-20 03:02 echo -en "Content-Length: $LEN\r\n" >&3 2008-09-20 03:02 echo -en "\r\n" >&3 2008-09-20 03:02 cat $TMPDIR/post-$$ >&3 2008-09-20 03:02 rm -f $TMPDIR/post-$$ 2008-09-20 03:02 close3 2008-09-20 03:02 } 2008-09-20 03:02 why in bash may I ask? 2008-09-20 03:02 even does post 2008-09-20 03:02 because this was for a disk-less system 2008-09-20 03:02 and the entire thing ran in bash basically 2008-09-20 03:03 sick^2 2008-09-20 03:03 nope 2008-09-20 03:03 this is where it gets sick: 2008-09-20 03:03 open3pr() { 2008-09-20 03:03 exec 3<>/dev/tcp/$IPPNAME/$IPPPORT 2008-09-20 03:03 } 2008-09-20 03:03 agreed 2008-09-20 03:03 putchar() { 2008-09-20 03:03 echo -en "\x"`printf "%02X" $1` 2008-09-20 03:03 } 2008-09-20 03:03 ippstr() { 2008-09-20 03:03 SIZE=`echo -en "$*" | wc -c` 2008-09-20 03:03 putchar $[$SIZE/256] 2008-09-20 03:03 putchar $[$SIZE%256] 2008-09-20 03:03 echo -en "$*" 2008-09-20 03:03 } 2008-09-20 03:03 ippstr3() { 2008-09-20 03:03 echo -en "$1" 2008-09-20 03:03 ippstr "$2" 2008-09-20 03:03 ippstr "$3" 2008-09-20 03:03 } 2008-09-20 03:03 print_header() { 2008-09-20 03:03 PRINTJOB=$[$PRINTJOB+1] 2008-09-20 03:03 echo -en "\1\1\0\2\0\0\0\1" 2008-09-20 03:03 echo -en "\1" 2008-09-20 03:03 ippstr3 "G" "attributes-charset" "iso-8859-1" 2008-09-20 03:03 ippstr3 "H" "attributes-natural-language" "en-us" 2008-09-20 03:03 ippstr3 "E" "printer-uri" "ipp://$IPPNAME:$IPPPORT/printers/$PRINTER" 2008-09-20 03:04 ippstr3 "B" "requesting-user-name" "root" 2008-09-20 03:04 ippstr3 "B" "job-name" "judge-job-$PRINTJOB-$1" 2008-09-20 03:04 ippstr3 "I" "document-format" "application/octet-stream" 2008-09-20 03:04 echo -en "\2" 2008-09-20 03:04 ippstr3 "B" "job-sheets" "none" 2008-09-20 03:04 ippstr3 "B" "" "none" 2008-09-20 03:04 echo -en "\3" 2008-09-20 03:04 } 2008-09-20 03:04 print() { 2008-09-20 03:04 open3pr 2008-09-20 03:04 TEMPFILE="$TMPDIR/header-$$" 2008-09-20 03:04 print_header "$1" >"$TEMPFILE" 2008-09-20 03:04 cat >> "$TEMPFILE" 2008-09-20 03:04 LENGTH=`wc -c<"$TEMPFILE"` 2008-09-20 03:04 echo -en "POST /printers/$PRINTER HTTP/1.1\r\n" >&3 2008-09-20 03:04 echo -en "Content-Length: $LENGTH\r\n" >&3 2008-09-20 03:04 echo -en "Content-Type: application/ipp\r\n" >&3 2008-09-20 03:04 echo -en "Host: $IPPNAME\r\n" >&3 2008-09-20 03:04 echo -en "\r\n" >&3 2008-09-20 03:04 cat "$TEMPFILE" >&3 2008-09-20 03:04 rm -f "$TEMPFILE" 2008-09-20 03:04 LENGTH=5 2008-09-20 03:04 #cat <&3 2008-09-20 03:04 while read line <&3; do 2008-09-20 03:04 if [ "$line" == "" -o "$line" == "$CR" ]; then break; fi 2008-09-20 03:04 case "$line" in 2008-09-20 03:04 "Content-Length: "*) 2008-09-20 03:04 LENGTH=`echo "$line" | sed "s/^Content-Length: //;s/\r\\$//"` 2008-09-20 03:04 ;; 2008-09-20 03:04 esac 2008-09-20 03:04 echo "$line" 2008-09-20 03:04 done 2008-09-20 03:04 dd bs=1 count=$LENGTH 2>/dev/null <&3 | xxd 2008-09-20 03:04 close3 2008-09-20 03:04 } 2008-09-20 03:04 and you have printing to an ipp print spool 2008-09-20 03:04 no bash-that-writes-bash? 2008-09-20 03:05 oh, let me find another snipper 2008-09-20 03:05 heh 2008-09-20 03:05 pastie please 2008-09-20 03:05 the /dev/tcp trick is disabled in bash on ubuntu/debian 2008-09-20 03:05 so we are hacking routers? 2008-09-20 03:05 dsl router or something? 2008-09-20 03:05 print server? 2008-09-20 03:05 no online contest judge system 2008-09-20 03:06 for programming contest 2008-09-20 03:06 ah 2008-09-20 03:06 basically my MSc thesis 2008-09-20 03:06 uses diskless nodes to perform testing of untrusted code 2008-09-20 03:06 your msc thesis was a contest? 2008-09-20 03:06 I see 2008-09-20 03:06 contest is the algorithm 2008-09-20 03:06 the code to support the national eliminations for ICPC 2008-09-20 03:06 called AMPPZ 2008-09-20 03:07 in Poland 2008-09-20 03:07 (ACM ICPC) 2008-09-20 03:07 so you wrote the judge script for it and got a msc for it? 2008-09-20 03:07 no, a lot more 2008-09-20 03:08 the rootfs file system, the stripped down compilers 2008-09-20 03:08 the judging server 2008-09-20 03:08 the scripting 2008-09-20 03:08 the firewall rules 2008-09-20 03:08 etc 2008-09-20 03:08 the entire system 2008-09-20 03:08 the whole infrastructure 2008-09-20 03:08 sandbox 2008-09-20 03:08 right 2008-09-20 03:08 yup 2008-09-20 03:08 still in active use for student coursework 2008-09-20 03:08 fun 2008-09-20 03:08 I was so much more boring as a student 2008-09-20 03:08 makes'em write code that actually runs and works on frickin' wickedly selected tests 2008-09-20 03:09 writing compilers and such 2008-09-20 03:09 ...still searching... 2008-09-20 03:10 $ cat rpc.sh 2008-09-20 03:10 #!/bin/echo You must include this file: 2008-09-20 03:10 escape() { 2008-09-20 03:10 local arg 2008-09-20 03:10 local ch 2008-09-20 03:10 for arg in "$@"; do 2008-09-20 03:10 echo -n " \$'" 2008-09-20 03:10 echo -n "${arg}" \ 2008-09-20 03:10 | while IFS= read -n 1 -r ch; do 2008-09-20 03:10 echo -n "\\x$(xxd -ps -l1 <<< "${ch}")" 2008-09-20 03:10 done 2008-09-20 03:10 echo -n "'" 2008-09-20 03:10 done 2008-09-20 03:10 echo 2008-09-20 03:10 } 2008-09-20 03:10 rpc() { 2008-09-20 03:10 local HOST="$1" 2008-09-20 03:10 shift 2008-09-20 03:10 local PROC="$1" 2008-09-20 03:10 shift 2008-09-20 03:10 local FUNC=$(type "${PROC}" | sed -rn '2,$p') 2008-09-20 03:10 # echo "rpc HOST[${HOST}] PROC[${PROC}] FUNC[${FUNC}] ARGS[$*]" 2008-09-20 03:10 local ARGS=`escape "$@"` 2008-09-20 03:10 ssh -ax "root@${HOST}" "${FUNC}; ${PROC}${ARGS}" } 2008-09-20 03:11 usage: 2008-09-20 03:11 rpc hostname shell_function parameters... 2008-09-20 03:11 executes shell_function on remote machine 2008-09-20 03:11 using a ssh-based rpc scheme 2008-09-20 03:11 yup 2008-09-20 03:12 crazy eh? 2008-09-20 03:12 well I was going to say, you out leeted the zumastor script, but then... did you write daemons in bash? 2008-09-20 03:12 local FUNC=$(type "${PROC}" | sed -rn '2,$p') 2008-09-20 03:12 this line is the kicker 2008-09-20 03:12 I have a web server running in bash, yes 2008-09-20 03:13 ok 2008-09-20 03:13 we're officially outleeted 2008-09-20 03:13 although it cheats and uses xinetd to launch itself 2008-09-20 03:13 since I can't figure out how to do listens in pure bash 2008-09-20 03:13 our daemons listen on pipes, but of course you need socks 2008-09-20 03:14 I can listen on a nc pipe 2008-09-20 03:14 they do spawn other daemons 2008-09-20 03:14 but that makes it single threaded 2008-09-20 03:14 so only one outstanding request 2008-09-20 03:14 right 2008-09-20 03:14 that was a pain 2008-09-20 03:14 oh, I also had a proxy running in bash 2008-09-20 03:14 somebody wrote a little bit of c to do nonblocking read 2008-09-20 03:14 I felt that was cheating 2008-09-20 03:15 yup 2008-09-20 03:15 anyway my proxy is 4K bash code 2008-09-20 03:15 I need to test the code 2008-09-20 03:15 before I sleep 2008-09-20 03:15 mostly debug and strings really 2008-09-20 03:15 test this dleaf walker 2008-09-20 03:15 it's been kind of a block for me 2008-09-20 03:15 unfun 2008-09-20 03:15 okay, I'm going to bed, since I have to get up at 9 2008-09-20 03:15 needs to be done 2008-09-20 03:15 is in the way of finishing extents 2008-09-20 03:15 you enjoy your testing... 2008-09-20 03:16 I won't, but the chat was fun 2008-09-20 03:16 and bh didn't share his idea with us ;-( 2008-09-20 03:16 maybe next time 2008-09-20 03:17 8910 flips 2008-09-20 03:17 3261 MaZe 2008-09-20 03:17 I think those cut'n'pastes should classify as cheating 2008-09-20 03:17 your ratio is catching up 2008-09-20 03:17 oh yeah 2008-09-20 03:17 well most of my chat is cut and paste too 2008-09-20 03:17 I've said this all before ;) 2008-09-20 03:18 you switched to another username to let me catch up 2008-09-20 03:18 oh right 2008-09-20 03:18 you just hit 10K 2008-09-20 03:19 anyway, enough is enough... 2008-09-20 03:19 good night and good testing 2008-09-20 03:19 good night 2008-09-20 03:22 ah! 2008-09-20 03:22 brilliance 2008-09-20 03:23 the OP_OR will actually be the name of a function 2008-09-20 03:23 it will simply be a callback 2008-09-20 03:23 find 2008-09-20 03:23 fine 2008-09-20 03:23 anyway back to bed ;-) 2008-09-20 03:23 heh 2008-09-20 03:23 just like me 2008-09-20 03:24 dleaf probe seems to be working 2008-09-20 03:24 wish I'd tested it earlier 2008-09-20 03:24 could have moved on 2008-09-20 03:25 whoops, nope 2008-09-20 07:07 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-20 07:10 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-20 09:29 -!- BSD(~bandan@38.117.250.152) has joined #tux3 2008-09-20 11:19 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-20 12:01 I copied over the patches from Daniel's tree. Until he fixes cloning, feel free to clone it from here: git clone git://makefile.in/tux3fs 2008-09-20 12:02 I will update it with patches from his tree regularly 2008-09-20 14:19 folks 2008-09-20 14:33 bsd, thanks 2008-09-20 14:38 flipz: np 2008-09-20 14:38 ACTION found a nice weekend to look around the tux3fs code 2008-09-20 15:50 BSD: yeah me too, I'm finally getting a chance to look at the lowlevel fuse api 2008-09-20 15:51 it is a nice weekend indeed 2008-09-20 15:51 i got sidetracked with a memory leak in inodetest last night 2008-09-20 15:51 and then fell asleep 2008-09-20 15:52 its strange if i put the main return right after make_tux3 it leaks even more memory 2008-09-20 15:52 someone else can chew on that one 2008-09-20 15:52 it's a known behavior of valgrind 2008-09-20 15:53 oh? 2008-09-20 15:53 has something to do with c library cleanup 2008-09-20 15:53 so not a bug? 2008-09-20 15:53 whether or not _exit gets called or something 2008-09-20 15:53 not sure 2008-09-20 15:53 yeah i didnt find anything 2008-09-20 15:53 maybe not a bug 2008-09-20 15:53 I used to know and I forgot 2008-09-20 15:54 vtable inode was getting detected as leaked 2008-09-20 15:54 ah 2008-09-20 15:54 but if i comment initialized it, other shit get detected 2008-09-20 15:54 (the next malloc) 2008-09-20 15:54 hmm 2008-09-20 15:54 maybe its because of some memmove or something 2008-09-20 15:54 confusing valgrind 2008-09-20 15:55 nice to know how valgrind works 2008-09-20 15:55 vtable is the only inode that doesn't actually get used 2008-09-20 15:55 I should remove it 2008-09-20 15:55 yeah thats why i figured just comment it out 2008-09-20 15:55 but it didnt help ;) 2008-09-20 15:55 it's out of the design already 2008-09-20 15:55 posted a nice troll to lkml over that and nobody bit 2008-09-20 15:56 yeah i was notcing we still have a version.c or something too 2008-09-20 15:56 i dont buy your argument 2008-09-20 15:56 wanted to get a discussion going re whether that stuff should be in the filesystem or the volume manager 2008-09-20 15:56 well 2008-09-20 15:56 discuss or shut up ;) 2008-09-20 15:56 er i mean volume.c 2008-09-20 15:56 you see a use for subvolumes? 2008-09-20 15:56 I don't any more 2008-09-20 15:57 funny, we started from the opposite positions 2008-09-20 15:57 sharing free space? 2008-09-20 15:57 right 2008-09-20 15:57 useless 2008-09-20 15:57 super useful 2008-09-20 15:57 they should share volume free space 2008-09-20 15:57 only 2008-09-20 15:57 it allows you to compartmentalize 2008-09-20 15:57 you can do the same with volumes 2008-09-20 15:58 share free space? 2008-09-20 15:58 yes 2008-09-20 15:58 lvm3 2008-09-20 15:58 lvm2 even, though lame 2008-09-20 15:59 so your solution is to punt to the block layer 2008-09-20 15:59 yes 2008-09-20 16:00 layering 2008-09-20 16:00 there's no compelling argument to do otherwise 2008-09-20 16:00 the strongest argument is, perhaps you could get finer grained free space sharing within the fs 2008-09-20 16:00 but that's bad for performance 2008-09-20 16:00 see the fsync story 2008-09-20 16:00 also matt's comments 2008-09-20 16:00 yeah 2008-09-20 16:00 hmm 2008-09-20 16:00 his granularity is 128 MB, ours should be too 2008-09-20 16:01 since zfs has such lame quota support what people do is create one volume per user for example 2008-09-20 16:01 /home/user is a volume 2008-09-20 16:01 and let them all share free space 2008-09-20 16:02 lets do quots right then 2008-09-20 16:02 so if you only had a /home volume 2008-09-20 16:02 all the users would share free space anyway i suppose 2008-09-20 16:03 let's do per directory quota, with hardlinks between them disabled 2008-09-20 16:03 hm 2008-09-20 16:03 the rule is: a directory with quota may not have hard links into it 2008-09-20 16:03 just can't do that boz 2008-09-20 16:04 sounds reasonable 2008-09-20 16:04 you mean out of it 2008-09-20 16:05 the direction is ambiguous 2008-09-20 16:05 not in or out 2008-09-20 16:05 just like a home directory 2008-09-20 16:05 well 2008-09-20 16:05 are there any hardlinked things in home directories? 2008-09-20 16:05 not usually. 2008-09-20 16:05 lets see 2008-09-20 16:06 is there a way to perl that? 2008-09-20 16:06 maybe with the help of locate? 2008-09-20 16:06 challenge 2008-09-20 16:07 find 2008-09-20 16:07 ? 2008-09-20 16:07 find -links 2 ! -type d 2008-09-20 16:07 how do you reject internal links? 2008-09-20 16:08 cant 2008-09-20 16:08 would be -links +1 i think 2008-09-20 16:08 "more than 1 link" 2008-09-20 16:08 hard to find those hard links 2008-09-20 16:08 needs help from the indexer 2008-09-20 16:08 locate 2008-09-20 16:08 somehow 2008-09-20 16:09 hmm 2008-09-20 16:09 no 2008-09-20 16:09 i'm not even finding any internal links 2008-09-20 16:09 it's much harder 2008-09-20 16:09 have to list by inode 2008-09-20 16:09 and check everything outside by inode 2008-09-20 16:10 I don't know of any common use for hardlinks across home dir boundaries 2008-09-20 16:10 no its not hard 2008-09-20 16:10 you certainly can't have them if the home dir is a separate volume like zfs 2008-09-20 16:10 just print out nlinks and inode number 2008-09-20 16:10 well hrm 2008-09-20 16:11 you can ls the inodes inside and outside and intersect 2008-09-20 16:11 well 2008-09-20 16:11 the intersection has to be empty 2008-09-20 16:11 translate into simple bash ;) 2008-09-20 16:12 there is a way 2008-09-20 16:14 find, excluding the home dir, recursive list all inodes 2008-09-20 16:14 then recursive list the inodes of the dir 2008-09-20 16:14 find -links +1 ! -type d -printf "%i %n\n" | sort | uniq -c | awk '($1!=$3) {print}' 2008-09-20 16:14 no inode in the second is allowed to be in the first 2008-09-20 16:14 onliner boo yeah 2008-09-20 16:14 oneliner* 2008-09-20 16:14 it works? 2008-09-20 16:14 duh 2008-09-20 16:14 i wrote it 2008-09-20 16:15 of course it does 2008-09-20 16:15 but does it work the way we hope 2008-09-20 16:15 it only finds files which are externally linked 2008-09-20 16:15 or link to external files 2008-09-20 16:15 same thing really 2008-09-20 16:15 yes 2008-09-20 16:15 symmetric 2008-09-20 16:15 it only prints the inode number unfortunately 2008-09-20 16:16 did it find any? 2008-09-20 16:16 so you have to do another pass and find which file is the offender 2008-09-20 16:17 why won't it find hardlinks within a directory? 2008-09-20 16:19 find -links +1 ! -type d -printf "%i %n\n" | sort | uniq -c | awk '($1!=$3) {print $2}' | xargs --no-run-if-empty -n 1 find -inum 2008-09-20 16:19 that does the pass to translate inodes to filenames 2008-09-20 16:19 why won't it find hardlinks within a directory? <- still wondering 2008-09-20 16:19 ok 2008-09-20 16:20 for the first find 2008-09-20 16:20 it prints out the inode number and nlinks 2008-09-20 16:20 for each file with more than 1 link 2008-09-20 16:20 then the sort | uniq -c 2008-09-20 16:20 counts how many times each occurs 2008-09-20 16:20 oh right 2008-09-20 16:20 yes 2008-09-20 16:20 good 2008-09-20 16:20 nice 2008-09-20 16:20 then the awk compares 2008-09-20 16:20 and prints out the inode number if they dont match 2008-09-20 16:20 i tested it it works 2008-09-20 16:20 wait 2008-09-20 16:21 it will still pick up hardlinks strictly within the dir 2008-09-20 16:21 no 2008-09-20 16:21 it will not 2008-09-20 16:21 oh right 2008-09-20 16:24 got it 2008-09-20 16:24 you look for number of hard links greater than number of occurances within the dir 2008-09-20 16:25 sweet 2008-09-20 16:25 or != 2008-09-20 16:25 no, can't be < 2008-09-20 16:25 but you check for less anyway ;) 2008-09-20 16:25 hoping to find disk corrpution I assume 2008-09-20 16:31 getting close to sk8 oclock 2008-09-20 16:40 i skated this morning 2008-09-20 16:41 and? 2008-09-20 16:41 so no more for me 2008-09-20 16:41 lamer 2008-09-20 16:41 got other stuff to do 2008-09-20 16:41 like... figure out the fuse stuff 2008-09-20 16:41 they all say that 2008-09-20 16:41 oh right 2008-09-20 17:40 so what's the difference between volumes and subvolumes? i'm thinking subvolumes might be helpful for some security partitioning 2008-09-20 17:43 bushman, a subvolume shares nothing but the allocation space with another subvolume 2008-09-20 17:43 since it can share allocation space, it goes in the opposite direction of security partitioning 2008-09-20 17:44 what you want are real volumes 2008-09-20 17:44 which is another reason I real it is right to drop subvolumes 2008-09-20 17:44 so to have separate partitions for data with different labels, we'd just have regular volumes, or use traditional partitions? 2008-09-20 17:45 yes 2008-09-20 17:45 any drawback? 2008-09-20 17:45 well wait 2008-09-20 17:45 we're going to be able to separate the allocation space for data within the same namespace 2008-09-20 17:46 and we're going to be able to separate namespaces 2008-09-20 17:46 now a multimillion dollar question: could i make them transparent? as in if i'm a user with two levels, can i see both partitions/volumes on top of each other? 2008-09-20 17:46 just not using the subvolume idea 2008-09-20 17:46 yes 2008-09-20 17:46 that's the plan 2008-09-20 17:46 do you know where i'm going with this? i want polyinstantinated directories 2008-09-20 17:47 try again with words of one syllable 2008-09-20 17:47 that might be more namespacing tricks tho, i'm not sure how you'd design it 2008-09-20 17:47 with namespacing tricks 2008-09-20 17:47 I'm working on it 2008-09-20 17:48 heirarchically inherited namespaces to be exact 2008-09-20 17:48 much like the versioning model 2008-09-20 17:48 except its not versions, it's namespaces 2008-09-20 17:49 I suspect that amounts to polyinstantinated directories 2008-09-20 17:49 they look different depending on who you are 2008-09-20 17:49 i basically want a /home/marcin directory, and if i got lets' say S and TS labels then i see both sets of files inside my home dir, but let's say i've been naughty and they pulled my TS, and I should be seeing only S files. however the rest of my files shouldnt delete, just sit there for someone with a security dominating mine to pick them up or whatever 2008-09-20 17:50 yes 2008-09-20 17:50 you put it much more succintly 2008-09-20 17:50 correct, that's the plan 2008-09-20 17:50 you put it in terms that don't require leaps of logic 2008-09-20 17:51 awesome, as long as the files are stored of different physical disks or paritions 2008-09-20 17:51 right 2008-09-20 17:51 namespace partitioning is one of two forms of partitioning we have in mind 2008-09-20 17:51 the other is physical data 2008-09-20 17:51 there's a lot of discussion lately if one disk with different encryptions should be treated as equivalent to multiple disks 2008-09-20 17:52 partitioned onto different volumes according to the class of data, and the filesystem amalgamates those volumes into a... filesystem 2008-09-20 17:52 to be more precise, the volume manager amalgamates those volumes, but the filesystem knows the layout 2008-09-20 17:53 and does data allocation accordingly 2008-09-20 17:53 this differs somewhat from the zfs model 2008-09-20 17:53 which takes the task of amalgamation into itself 2008-09-20 17:54 tux3 sitting on lvm3 will just want to look at the lvm's mapping table 2008-09-20 17:54 and be able to specify how the lvm should change that table 2008-09-20 17:55 so that it can provision itself with as much of the different kinds of storage as it needs, in the places it wants it 2008-09-20 17:55 so you guys wanna utilize the lvm underneath? i thought the whole idea of zumastore is to eliminate it? 2008-09-20 17:55 rewrite the lvm 2008-09-20 17:55 that will be lvm3 2008-09-20 17:55 but we can get by with lvm2 2008-09-20 17:55 it just sucks for adminning 2008-09-20 17:55 everything manual 2008-09-20 17:55 we want automatic 2008-09-20 17:56 sounds great 2008-09-20 17:56 that's what we think :) 2008-09-20 17:56 it's one of those itches 2008-09-20 17:56 multi year itch 2008-09-20 17:56 if i can demo any of this, in however raw form, i will definitely get some raised eyebrows 2008-09-20 17:57 I think we can get an early demo, yes 2008-09-20 17:57 we'll do the provisioning manually, using the existing lvm 2008-09-20 17:57 and the fs will proceed to partition data as promised 2008-09-20 17:57 partitioning namespace requires more effort 2008-09-20 17:57 we'd have to find a way of bootstrapping that project wise 2008-09-20 18:00 so how would you deal with union'ed namespaces? 2008-09-20 18:00 if i got /home/marcin/attackatdawn(TS) and another one at (S), how would it show? 2008-09-20 18:01 and more importantly, how does a user pick which one they're dealing with? 2008-09-20 18:02 when you have high security clearance you have the option of covering up a lower clearance file by creating one of the same name 2008-09-20 18:02 would it internally be stored as /home/marcin/TS/file and /home/marcin/S/file or something wackier than that? 2008-09-20 18:03 a tag goes on to the beginning of the filename internally 2008-09-20 18:03 define 'covering up' 2008-09-20 18:03 and is part of the namespace lookup 2008-09-20 18:03 covering up means: by default you will get EXIST, but you can override that and create a new entry that overrides the old one 2008-09-20 18:04 obviously, it is best not to cover up a lower security file 2008-09-20 18:04 override as in delete the old one? 2008-09-20 18:04 but you can if you want 2008-09-20 18:04 no, as in both exist 2008-09-20 18:04 override as in cover up the old one. It will reappear if you delete yours 2008-09-20 18:04 but only highest security you have access to can be read 2008-09-20 18:04 so i cannot access both at the same time? 2008-09-20 18:04 right 2008-09-20 18:04 no 2008-09-20 18:04 you could access the S if the TS privs were dropped 2008-09-20 18:04 if you want that, log in as two people or don't make the names collide 2008-09-20 18:05 you'd usually be doing this anyway to store junk misdirecting data at a lowe sec level 2008-09-20 18:05 if i wanted to have multiple users for multiple levels of security, i'd just stick to regular systems, not building MLS one 2008-09-20 18:05 bushman, we could fiddle around with the idea and make both visible through some messed up name syntax, it's hard to see why that would be better though 2008-09-20 18:06 agreed 2008-09-20 18:06 If you're going to access it via diff names 2008-09-20 18:06 why not give it diff names to begin with? 2008-09-20 18:06 i need to be able to access all versions if my security dominates labeles on multiple files 2008-09-20 18:06 rename TS 2008-09-20 18:06 access S 2008-09-20 18:06 tha'ts why it's polyinstantiated 2008-09-20 18:06 rename TS back to old name 2008-09-20 18:06 yes 2008-09-20 18:07 Bushman: you're making this sound like resource forks... 2008-09-20 18:07 bushman, the problem is, you start drifting away from unix semantics 2008-09-20 18:07 no, the idea is to have multiple files with same name at different levels and access them all as long as i'm cleared 2008-09-20 18:07 but if they have the same name.. 2008-09-20 18:07 bushman, ok, could you show an example of accessing two of them? 2008-09-20 18:08 what command would you write? 2008-09-20 18:08 oh i know, that's why i came to you guys with this, it's out there stuff 2008-09-20 18:08 I guess you could suffix the filenames with somthing like /path/file#S to override access to /path/file (TS) into /path/file (S) 2008-09-20 18:08 but we'd need to waste a character 2008-09-20 18:08 for the # symbol 2008-09-20 18:08 that's not unix though 2008-09-20 18:08 ok, let's make a user marcin with labels of s0,s1 2008-09-20 18:08 agreed 2008-09-20 18:09 bushman, we can apply the versioned symlink idea 2008-09-20 18:09 that lets you access files from different versions, on the same version 2008-09-20 18:09 so in this case, you'd have a priviledged symlink to the directory you're in 2008-09-20 18:10 i need to have files /supersecretcrap (s0) and /supersecretcrap (s1), and i should be able to pick either one to work on 2008-09-20 18:10 when you read the directory through the symlink, you see one view, and a different view if you read it directly 2008-09-20 18:10 so why not call them /s0_supersecretcrap and /s1_supersecretcrap? 2008-09-20 18:10 privileged symlink? what's that? 2008-09-20 18:10 bushman, new invention 2008-09-20 18:10 just now 2008-09-20 18:10 based on the idea of versioned symlink I have written about 2008-09-20 18:11 Maciek, that's not my call, that's what the bigwigs in bunkers want ;) 2008-09-20 18:11 only works if the symlink is parsed at the sub-vfs layer - or if we muck around with the vfs 2008-09-20 18:11 maze, that is the plan 2008-09-20 18:11 sub-vfs 2008-09-20 18:11 we will need to extent some syscalls 2008-09-20 18:11 extend 2008-09-20 18:11 so to apps, does it look like a ymlink? 2008-09-20 18:11 one syscall, actually, ln 2008-09-20 18:12 it looks like a symlink yes 2008-09-20 18:12 why extend the syscall? can't we make the data stored in a symlink (it's binary remember) suffice? 2008-09-20 18:12 oh so you want a file /secretcrap with multiple symlinks to it called s0, or s1? 2008-09-20 18:12 no, to the entire directory 2008-09-20 18:12 ln needs to know how to create one, until it does we provide our own utility 2008-09-20 18:12 to create one of these 2008-09-20 18:13 bushman, yes 2008-09-20 18:13 ln -s 'binary_blob' filename 2008-09-20 18:13 including no symlink 2008-09-20 18:13 maze, possibly ;) 2008-09-20 18:13 I don't like this 2008-09-20 18:13 it's dirty 2008-09-20 18:13 that'd work, cuz you can still work on it as a normal file...hmm 2008-09-20 18:13 make, your idea? 2008-09-20 18:13 versioned symlinks is clean 2008-09-20 18:13 what's the problem? 2008-09-20 18:14 it's fine for versioning 2008-09-20 18:14 I'm not sure I like it for this s* crap 2008-09-20 18:14 don't have to use it 2008-09-20 18:14 with versioning you get a view from the past that's not mutatable 2008-09-20 18:14 flips, one question: would the links be created dynamically when requesting info on a file, or would they actually be laying around at all times for people to use? 2008-09-20 18:14 maze, not so 2008-09-20 18:14 versions are all rw 2008-09-20 18:15 oh, right 2008-09-20 18:15 bushman, they'd be lying around, or you'd create them if you have the right clearance 2008-09-20 18:15 sun is getting lower 2008-09-20 18:16 I need to hustle out if I'm going to get to the strand before nightfall 2008-09-20 18:17 so would a regular user see /file /filesecret /filetopsecret with the last two being links, or the real /file would be not visible, only the links would be visible? 2008-09-20 18:18 I believe a regular user would not see anything 2008-09-20 18:18 since he'd have no clearance 2008-09-20 18:18 the regular users would only see /file 2008-09-20 18:18 well 2008-09-20 18:18 i'm trying to prevent mistakes from mindlessly pickign a wrong link/file, we all know how unaware of ownership/permissions most people are 2008-09-20 18:18 if file is low clearance 2008-09-20 18:18 no no, regular user i meant not root ;) 2008-09-20 18:19 you don't even see the links unless they exist at or below your level 2008-09-20 18:19 not a user that's cleared only to unclass 2008-09-20 18:19 same with the files 2008-09-20 18:19 I believe unclass user would see zlich 2008-09-20 18:19 yes 2008-09-20 18:19 secret user would see file and filesecret 2008-09-20 18:19 topsecret would see all 3 2008-09-20 18:19 yes 2008-09-20 18:19 if you see filesecret or/and filetopsecret it's a symlink 2008-09-20 18:19 maze, I think /file is supposed to be unsecret 2008-09-20 18:20 oh, I don't know, and that's one of the reasons I don't like this 2008-09-20 18:20 yea, that's just standard domination/lattice based security scheme, SElinux does it for you behind the scenes 2008-09-20 18:20 wait till i throw in compartments into TS ;) 2008-09-20 18:21 bushman, it would not make sense to create a security link for a high clearance level, so that a low clearance person can see it 2008-09-20 18:21 they're just their for the convenience of the supersecret spooks 2008-09-20 18:21 so nobody less secret has to see them 2008-09-20 18:21 yes of course 2008-09-20 18:22 i dunno if shap talked to you about the whole gaugin-messenger non-interference model yet 2008-09-20 18:22 i spent like 2hrs talking him through the subtleties of that and general 'need to know' stuff 2008-09-20 18:23 bushman, you know what is really cool about this? it all happens far away from the vfs, which never gets to see it 2008-09-20 18:23 which makes it harder to subvert 2008-09-20 18:23 that's the goal 2008-09-20 18:24 otoh 2008-09-20 18:24 this is precisely what the vfs should be dealing with... 2008-09-20 18:24 you mean the 'visibility' issues? 2008-09-20 18:24 yes, all of it 2008-09-20 18:25 there is no benefit to doing this below the vfs, except for code duplication across fs'es 2008-09-20 18:25 and multiple opportunities to screw it up 2008-09-20 18:25 and If someone compromises the vfs, they've got your kernel and your fs drivers as well 2008-09-20 18:25 other than political reasons of people saying 'no' simply because noone else but tux using these features 2008-09-20 18:25 [unless we're not talking about linux here] 2008-09-20 18:26 i know Daniel got a lot of pull, but i dunno how much we can force down these people's throats for the sake for esoteric security features 2008-09-20 18:26 ah, yes, politics... 2008-09-20 18:27 so for political reasons we might have to smuggle this shit under tux's branch so the rest of people dont have any say in it 2008-09-20 18:27 thing is this should be done with xattr and something selinux like 2008-09-20 18:28 yes, SElinux does it all on xattr, that's how we get all the functionality i need 2008-09-20 18:28 so what are you missing? 2008-09-20 18:28 multiple clearance levels for diff files with the same name? 2008-09-20 18:28 maze, I don't know if this belongs in vfs 2008-09-20 18:29 maze, maybe in ten years 2008-09-20 18:29 polyinstantiation is the huge goal to get Linux to be a true MLS system 2008-09-20 18:29 yes, and we can do it at the filesystem level so we should 2008-09-20 18:29 cut out as much bs as possible 2008-09-20 18:29 there is no existing model 2008-09-20 18:29 if we have this, govt/mil wont even look at solaris trusted extensions anymore 2008-09-20 18:29 why is polyinstantiation such a big feature? 2008-09-20 18:30 versioning is already a kind of polyinstantiation 2008-09-20 18:30 yes, true 2008-09-20 18:31 that's a fair point that we get it almost for free... 2008-09-20 18:31 nearly 2008-09-20 18:31 except for storing different sec levels on different back end devices 2008-09-20 18:31 it's the buliding block of MultiLevelSecurity, which ends up giving you a higher classification of a system, so we'll be applicable to go into crazier installations like subs, planes, and other deep dark holes 2008-09-20 18:31 linux in subs? 2008-09-20 18:31 my god 2008-09-20 18:31 maze, that's where we just give them the xattrs 2008-09-20 18:31 that's almost reason enough to not do this 2008-09-20 18:31 HAVE YOU SEEN THE CODE? 2008-09-20 18:31 ;-) 2008-09-20 18:31 bushman, ooh 2008-09-20 18:31 not yet, that's why we're pushing for more functionality ;) 2008-09-20 18:32 ok I'm going to actually read the wikipedia article now 2008-09-20 18:32 but after skating 2008-09-20 18:32 sun is getting critically low 2008-09-20 18:32 oh you wont find much on this on the internet 2008-09-20 18:32 the linux code base sucks, I wouldn't want it anywhere near anything life-critical or nuclear 2008-09-20 18:32 well at least the unclass portion of the internet ;) 2008-09-20 18:33 the classified portions of the internet are predominantly porn and warez ;-) 2008-09-20 18:33 what do you want instead? trusted solaris 8? 2008-09-20 18:33 maze, so you want something worse there? 2008-09-20 18:34 I'd always thought the military had something home-grown, and tiny for the really important stuff 2008-09-20 18:34 this quickly becomes a question of lesser evils 2008-09-20 18:34 maze, teehee 2008-09-20 18:34 you know sucky performance, runs on well bug-cleared 486 cores 2008-09-20 18:34 bwahaha, i went to military gradschool with a bunch of officers, with very few exceptions they couldnt code their way out of a paper bag 2008-09-20 18:35 maze, from my skydiving years I remember when the miltary gave up deveoping and just bought sport chutes 2008-09-20 18:35 but has been gone through with a fine comb once a quarter for two dozen years 2008-09-20 18:35 that's disappointing 2008-09-20 18:35 this is what i'm trying to push, if we give them one big important piece, they should get off their ass and sponser massive code audits on the rest of linux 2008-09-20 18:35 I'm not aware of any commercial code that isn't bug-ridden 2008-09-20 18:36 hmm, I see 2008-09-20 18:36 now - that - seems like a worthwhile goal 2008-09-20 18:36 have they done that on solaris? 2008-09-20 18:37 here's the problem with 'secure' development: you end up hiregin people that have proper clearances, instead of people with mad skillz. but since it's all done behind well guarded doors, noone ever will know how much it sucks 2008-09-20 18:37 I wonder if the US is the only country doing anything serious in this area... I mean there's so many other countries, and I don't buy them all buying off of the us... 2008-09-20 18:37 clearances are a very good way to hide incompetence :/ 2008-09-20 18:37 I'm clear of clearances ;-) 2008-09-20 18:37 and none of these people will ever share 2008-09-20 18:37 cuz it's national security 2008-09-20 18:37 national insecurity rather 2008-09-20 18:38 clearances are a good way into a depression 2008-09-20 18:38 tell me about it :/ 2008-09-20 18:38 the more you learn, the more surprised you are you're (we're) still alive 2008-09-20 18:38 you ever seen dr. strangelove? 2008-09-20 18:39 might have, don't recall 2008-09-20 18:39 was that a bond movie? 2008-09-20 18:39 old 60s movie from Kubric, about atomic bomb and the doomsday device 2008-09-20 18:39 no don't recall 2008-09-20 18:40 then my jokes wouldnt mean much ;) 2008-09-20 18:40 ah 2008-09-20 18:40 well, the worst part is none of what you've written sounds like a joke... 2008-09-20 18:41 but in general it's a very strange environment, that's why i'm here, so we can bridge some of the well audited/tested code into domains that are usually very very separated from the rest of the world 2008-09-20 18:41 i was about to crack a joke, but realized it'd sound very goofy and out of place unless you know dr strangelove 2008-09-20 18:41 go ahead 2008-09-20 18:41 anyway 2008-09-20 18:42 nah, doesnt matter 2008-09-20 18:42 I'm assuming this means your a civilian contractor with necessary clearances working for some military/defense whatever arm of the us gov? 2008-09-20 18:43 not a contractor, actual govt worker 2008-09-20 18:43 Do the true military guys treat civilians like sh*t? 2008-09-20 18:43 some 2008-09-20 18:43 ah 2008-09-20 18:43 in gradschool we got a lot of it, cuz it was 300 military officers and 10 civillians 2008-09-20 18:43 so it's got all that (and more) beautiful politics to live with 2008-09-20 18:44 so why civilian then and not military? 2008-09-20 18:44 who, me? 2008-09-20 18:44 in general, if almost everybody is military, why the pslit? 2008-09-20 18:44 ah, whatever, disregard... 2008-09-20 18:45 I'm not sure I even want to know how it all works ;-) 2008-09-20 18:45 or whether it works 2008-09-20 18:45 you dont, it's depressing shit 2008-09-20 18:45 so, Marcin - Polish roots? 2008-09-20 18:46 100% 2008-09-20 18:46 heh 2008-09-20 18:46 that's more than me then (unless I treat my roots as adding up to > 100) 2008-09-20 18:46 i thought you had a site from UJ 2008-09-20 18:47 I do 2008-09-20 18:47 but you know... standard story: 2008-09-20 18:47 oh that's weird 2008-09-20 18:47 conceived in France, born in Britain, grew up in Poland, kindergarten - grade 8 in Canada, high school and university in Poland, now working in California... 2008-09-20 18:48 weird stuff ;-) 2008-09-20 18:48 but, yeah, my Family's Polish, just done a lot of travelling. 2008-09-20 18:48 no wonder you end up working with shap, most of his friends in chicago were pollacks ;) 2008-09-20 18:49 Shap's in Santa Monica though, I'm in Mountain View 2008-09-20 18:49 well, work as in on tux 2008-09-20 18:49 I don't think I've ever actually met Shapor 2008-09-20 18:49 he's a fantastic character, love him dearly 2008-09-20 18:49 scary smart but without ego, which is strange 2008-09-20 18:50 this is growing to be a great crew 2008-09-20 18:50 yeah, way better than the other way round (ego without smarts) 2008-09-20 18:50 i wish i could code anywhere near what's needed here 2008-09-20 18:50 coding isn't actually the problem 2008-09-20 18:50 coding is trivial 2008-09-20 18:51 not to me, i'm a codeing retard 2008-09-20 18:51 the real problems are figuring out what the interfaces of the rest of the kernel are and how to obey them 2008-09-20 18:51 and what algos and data structures to use for what purpose 2008-09-20 18:51 and where 2008-09-20 18:51 yea that's a problem trying to merge yourself into a huge prexisting infrastructure 2008-09-20 18:52 there is actually very little coding and code involved ;-) 2008-09-20 18:52 i did little kernel coding in minix in gradschool, but that's it, what you guys are talking about gives me a headache 2008-09-20 18:53 the real problem for me at least - is the terrible lack of any documentation 2008-09-20 18:53 i'm just trying to steer it in the right direction, i'm more of a lobbyst/fanclub ;) 2008-09-20 18:53 and the documentation that is present is often either wrong or partial or outdated 2008-09-20 18:53 for the interfaces you mean? 2008-09-20 18:53 yeah 2008-09-20 18:54 i'm setting up lxr for us 2008-09-20 18:54 cool 2008-09-20 18:54 trying to do some support work 2008-09-20 18:54 you're somewhere in south carolina right? 2008-09-20 18:54 it's mostly set up, but i did it wtihout the free text searches, and Daniel wanted it, so i gotta redo a big chunk 2008-09-20 18:54 yea, near Charleston 2008-09-20 18:54 oh, sad. 2008-09-20 18:55 eh no problem, gotta refresh myself on linux sysadmining, i'm kinda rusty, school and new job took me out of commision for 3yrs 2008-09-20 18:55 let's look up Charleston on the map 2008-09-20 18:55 look where civilisation ends, and it's right on the border ;) 2008-09-20 18:56 that's true 2008-09-20 18:56 it's on the coast ;-) 2008-09-20 18:56 right next to hollywood 2008-09-20 18:56 I'll be passing through hollywood on halloween 2008-09-20 18:57 hollywood? 2008-09-20 18:57 the one with movies or some other one? 2008-09-20 18:57 the one in LA 2008-09-20 18:58 yea, i gotta go bug shap again, we need to stage a gettogether 2008-09-20 18:58 man, SC is weird 2008-09-20 18:58 'NO SHIT 2008-09-20 18:58 there doesn't seem to be anything there judging from the map 2008-09-20 18:58 swamps with aligators providing free security around military basis 2008-09-20 18:58 lol 2008-09-20 18:58 and ghettos with poor people 2008-09-20 18:59 been to Atlanta, hated the wather 2008-09-20 18:59 weather 2008-09-20 18:59 i aint kidding, my base has a pet aligator named Charlie, he's like 14ft ;) 2008-09-20 18:59 oh, 2008-09-20 18:59 you have your own base? 2008-09-20 18:59 wow, you're high up ;-) 2008-09-20 19:00 not my personal one ;) 2008-09-20 19:00 oh, is it just one of your holdings? 2008-09-20 19:00 oh yea, i'm big pimpin it ;) 2008-09-20 19:01 eh, every time I look at a map of the east coast of the US, I realize how I bloody don't know what the hell it's like 2008-09-20 19:01 the west coast is so easy: 2008-09-20 19:01 seattle, portland, san francisco, los angeles, san diego 2008-09-20 19:01 me neither, i've lived in chicago and cali, east coast is uncharted territory to me 2008-09-20 19:01 and you're done ;-) 2008-09-20 19:02 which also happens to be the 5 states: 2008-09-20 19:02 washington, oregon, north california, south california and mexico 2008-09-20 19:02 oh, ok, maybe a little off there ;-) 2008-09-20 19:03 and you should probably start with Vancouver... 2008-09-20 19:06 so how did you pick up this systems stuff? i read logs from last night, and your bash foo is out there 2008-09-20 19:07 mostly self taught 2008-09-20 19:07 really 2008-09-20 19:08 people coming out of european universities know a lot more than most american counterparts 2008-09-20 19:08 not having a life really helps ;-) 2008-09-20 19:08 here college is so common to do, it's more of an extension of highschool 2008-09-20 19:08 yeah, but most of my skills are stuff I picked up on my own 2008-09-20 19:08 indeed I have a MSc in Physics 2008-09-20 19:09 but I actually dropped out of univ, and then got back in and took 7 years to do it 2008-09-20 19:09 oh i'm back to not having a life, i had a life and it got too dramatic, thus new school/job 2008-09-20 19:09 because CS and running my own ISP was more fun and challenging and interesting 2008-09-20 19:09 (before you ask - mini-ISP for like 300 people) 2008-09-20 19:10 hehe, a friend of mine in poland did UJ too, molecular chemistry or something, took like 6yrs 2008-09-20 19:10 here if you're smart you get out in 3yrs, shap did it, i did like 3.5 2008-09-20 19:10 kinda a joke 2008-09-20 19:11 didnt learn shit about computers in college, that was mostly about hacking my way into coeds panties 2008-09-20 19:11 hey, that is at least worthwhile ;-) 2008-09-20 19:11 not in a long run 2008-09-20 19:12 I know 2008-09-20 19:12 i learned more by hanging out with shap hacking shit till 4am 2008-09-20 19:12 ah, so the two of you went to the same uni? 2008-09-20 19:13 no, different highschools, universities, just kept hanging out 2008-09-20 19:13 ah 2008-09-20 19:13 I think a lot of the truth here is that people finish college 2008-09-20 19:13 not university 2008-09-20 19:13 we met cuz he was running a bbs in the neighbourhood (which mattered back in the day of cost being distance dependent) 2008-09-20 19:13 and they end up with bachelors, not masters 2008-09-20 19:13 there's a big difference 2008-09-20 19:13 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-20 19:14 in US the difference between college and university is very blurry 2008-09-20 19:14 doesnt really mean much 2008-09-20 19:14 degrees do matter, sometimes quite a bit 2008-09-20 19:14 a lot of the colleges etc are really just lame local schools 2008-09-20 19:15 that's 2yr degrees 2008-09-20 19:15 more different stuff 2008-09-20 19:15 we have the same thing in Poland - most higher ed schools suck, but then they're mostly not called univseristies, and most good ones are. 2008-09-20 19:15 i was talking about 4yr degree 2008-09-20 19:15 bachelor's is 2 years? since when? I thought bachelors was 3-4 2008-09-20 19:15 no no 2008-09-20 19:15 with masters being another 2 on top of that 2008-09-20 19:16 associates is 2yrs 2008-09-20 19:16 bachelor is 4yrs 2008-09-20 19:16 masters is another 2 on top of that 2008-09-20 19:16 depends on program/school, etc 2008-09-20 19:16 but that's the general rundown 2008-09-20 19:16 so associates is ike worthless right? 2008-09-20 19:16 associates is worthless, that's like something you want if you want to be a cop 2008-09-20 19:17 bachelor is the normal 4yr degree 2008-09-20 19:17 cops aren't worthless though :-) 2008-09-20 19:17 oh yea, they're my favorite people 2008-09-20 19:18 was actually talking about cops and how hard a job they have at work yesterday 2008-09-20 19:18 some people do the 2yr associates and then transfer to 4yr programs to save money cuz the local 'community colleges' are usually very very cheap 2008-09-20 19:18 so doing the 2 in local comm col? 2008-09-20 19:18 yea, 2here and 2there 2008-09-20 19:19 you end up with a normal 4yr bachelor degree 2008-09-20 19:19 yeah, why education (higher) is so expensive in the US is something I never understood 2008-09-20 19:19 college can be very expensive so it's a viable technique if you dont want to bury yourself in debt 2008-09-20 19:19 oh i worked at a private university for 5yrs, i can explain that one ;) 2008-09-20 19:20 yeah, a car per year in expenses - cute for someone who isn't working... 2008-09-20 19:20 yea, my housemate is a smart dude, very hard working too, but comes from a poorass family, even with scholarships he racked up like 50+k in debt for undergrad 2008-09-20 19:21 but at least it's a possibility, if you got talent but no cash, it's still doable 2008-09-20 19:21 right 2008-09-20 19:21 it takes a long time to pay that off afterwards 2008-09-20 19:22 unless you end up landing an awesome job 2008-09-20 19:22 these days most colleges are 30k/yr+, which is insane, you start life at -$100,000 2008-09-20 19:22 true, but... 2008-09-20 19:22 you should also be earning an extra 20/30k per year because of them 2008-09-20 19:22 over the course of your entire life it adds up 2008-09-20 19:22 still sucks of course 2008-09-20 19:23 eh, college loans are relatively low interest, i know people who are 40yrs old and multimillionare, but still pay off their school loans to earn good credit ;) 2008-09-20 19:23 lol 2008-09-20 19:24 eh, some people i went to undergrad with got 2-3 degrees and sell cellphones for a living 2008-09-20 19:24 geeks have it made comparing to others 2008-09-20 19:25 in what sense? 2008-09-20 19:25 income after college? 2008-09-20 19:25 it's hard for us to be unemployed 2008-09-20 19:25 true 2008-09-20 19:25 if you really got little talent and just wanna do basic IT work you still can make 40-50k/yr easily 2008-09-20 19:26 if you got any sort of brains you get to 70k/yr quickly 2008-09-20 19:26 I'm assuming you're talking about east coast here 2008-09-20 19:26 first year as a high school teacher in good district in chicago payed like 28k 2008-09-20 19:27 yea, cali is a bit different 2008-09-20 19:27 we start higher, and go up easily 2008-09-20 19:28 and it's difficult to be unemplyed 2008-09-20 19:28 I spend something like 15K+ a year just to rent a studio... 2008-09-20 19:28 and then add utilities on top of that. 2008-09-20 19:28 heh, move to SC, you can buy a new house for 130k ;) 2008-09-20 19:29 I can buy half a studio in a sh*tty area for that 2008-09-20 19:29 oh i know, i lived in monterey for 2yrs, a shed on my block sold for 570k 2008-09-20 19:29 it was an old house the side of my current living room 2008-09-20 19:30 s/side/size/ 2008-09-20 19:30 right 2008-09-20 19:30 the california fixer upper for just slightly less than a million 2008-09-20 19:30 me and shap were thinking of moving to cali at the end of 2000, just before the market exploded 2008-09-20 19:31 imploded 2008-09-20 19:31 looked at prices of houses, median price was like 890k 2008-09-20 19:31 but the weather's nice ;-) 2008-09-20 19:31 and there's lots of job opportunities 2008-09-20 19:31 and interesting ones at that 2008-09-20 19:31 i'm a geek, i live in rooms in AC so there's no condensation on servers ;) 2008-09-20 19:32 the weather outside is irrelevant unless it's a hurricane 2008-09-20 19:32 I have an AC - haven't turned it on in 2+ years I've lived here 2008-09-20 19:32 truthfully didn't turn the heater on last winter either 2008-09-20 19:33 i havent turn OFF AC since march, it's 100F and 100% humidity at all times 2008-09-20 19:33 sh*tty gas stove, more trouble than it's worth 2008-09-20 19:33 south sucks weatherwise 2008-09-20 19:33 yep 2008-09-20 19:33 i understand now why they had slavery here, noone's gonna volunteer their ass to be outside 2008-09-20 19:34 lol... 2008-09-20 19:34 it's ahorrible joke, but it's true 2008-09-20 19:34 yeah 2008-09-20 19:34 the non-white folks are probably more used to the heat... 2008-09-20 19:35 and can thus better deal with it 2008-09-20 19:35 i have to mow my lawn at 8am, cuz by 9:30am it's too hot for my pasty white ass not to get scorched 2008-09-20 19:48 hm that hasn't happened in a while X was chewing up half my ram even after closing firefox 2008-09-20 20:44 hrm does readdir work for anyone using tux3fuse ? 2008-09-20 20:44 (not tux3fs) 2008-09-20 21:38 folks 2008-09-20 23:55 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-21 00:39 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-21 00:42 -!- Bobby(~Bobby@122.160.64.177) has joined #tux3 2008-09-21 00:49 hello 2008-09-21 00:49 :) 2008-09-21 00:53 hi 2008-09-21 00:53 hey shapor 2008-09-21 02:45 hey 2008-09-21 02:45 ACTION is back from a night of clubing in SD 2008-09-21 02:45 lubbing 2008-09-21 02:45 bah 2008-09-21 02:45 clubbing 2008-09-21 03:35 walk->estop -= walk->group--->count; <- true code 2008-09-21 07:43 -!- Kirantpatil(~kiran@122.167.211.253) has joined #tux3 2008-09-21 07:44 -!- Kirantpatil(~kiran@122.167.211.253) has left #tux3 2008-09-21 08:22 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-21 10:02 -!- pgquiles(~pgquiles@250.Red-79-144-194.staticIP.rima-tde.net) has joined #tux3 2008-09-21 10:25 ok, we have a pretty nice dleaf streamwise reader now 2008-09-21 10:25 next tricky issue is writing into the dleaf 2008-09-21 10:26 tricky because we're writing into the middle of a big glob of extents 2008-09-21 10:26 possibly truncating some at the beginning and end of the range of interest 2008-09-21 13:33 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-21 13:53 -!- mback(~mback@netblock-68-183-189-239.dslextreme.com) has joined #tux3 2008-09-21 14:08 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-21 15:07 -!- ajonat(~ajonat@190.48.117.81) has joined #tux3 2008-09-21 15:39 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-21 16:50 flipzout: dwalk_pack? 2008-09-21 16:52 shapor, email just sent out 2008-09-21 16:52 to tux3 list 2008-09-21 16:52 ah still didnt get it 2008-09-21 16:53 lets you poke extents into a dleaf one at a time, and it builds up the group and entry index as you go 2008-09-21 16:53 or it will when it works, which is far away 2008-09-21 16:53 heading out 2008-09-21 17:33 -!- kbingham(~kbingham@92.10.66.117) has joined #tux3 2008-09-21 17:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-21 19:54 -!- ajonat(~ajonat@190.48.117.81) has joined #tux3 2008-09-21 20:00 hey 2008-09-21 20:11 -!- BSD(~bandan@38.117.250.152) has joined #tux3 2008-09-21 21:55 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-21 22:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-22 01:48 yo 2008-09-22 04:13 7/3: 1000033 => 777; 1000044 => 888; 1000099 => 999; 2008-09-22 04:13 8/3: 3000044 => 444 666; 3000055 => 555; 3000000 => ; <- result of first dwak_pack attempt 2008-09-22 04:13 not bad, it actually added an entry 2008-09-22 04:13 messed it up, but. 2008-09-22 04:14 but bug hunt commences 2008-09-22 04:21 8/2: 3000044 => 444 666; 3000055 => 555 123 456; <- 3rd try = good 2008-09-22 04:21 now some more broken cases 2008-09-22 04:25 3000044 => 444 666; 3000055 => 555; 3000056 =>; <- 4th attempt, hmm seem to be missing something 2008-09-22 04:33 8/3: 3000044 => 444 666; 3000055 => 555; 3000056 => 123; <- result of fixing an off by one 2008-09-22 04:33 next boundary: add a new group 2008-09-22 04:37 9/1: 4000000 =>; <- not bad for a first attempt 2008-09-22 04:37 also overwrote the 0th extent of the leaf with the new group descriptor ;) 2008-09-22 04:46 9/1: 4000123 => 123 0 0 0; <- closer 2008-09-22 04:46 few extra zeros came from somewhere 2008-09-22 04:46 hmm 2008-09-22 04:47 ah, walk extend base needs to be bumpbed for the new group 2008-09-22 04:47 extent base 2008-09-22 04:48 ...by the group count of the current group 2008-09-22 04:49 err, no 2008-09-22 04:49 by the amount of the most recent entry limit 2008-09-22 04:50 9/1: 4000123 => 123; <- correct 2008-09-22 04:50 this is too easy 2008-09-22 04:50 you lamers who didn't rise to my sunday afternoon challenge should hang your heads ;) 2008-09-22 04:50 going to be a funny lkml post about this 2008-09-22 04:52 couple more boundaries to check 2008-09-22 04:52 next one: group count overflow 2008-09-22 04:55 8/7: 3000044 => 444 666; 3000055 => 555; 3001001 => 1; 3001002 => 1; 3001003 => 1; 3001004 => 1; 3001005 => 1; 2008-09-22 04:55 9/1: 3001006 => 1; <- works great 2008-09-22 04:55 first time 2008-09-22 04:55 now what? 2008-09-22 04:55 got to be more 2008-09-22 04:55 dwalk_pack looks too simple 2008-09-22 04:57 need to be able to add stuff in the middle of a dleaf, not just at the end I suppose 2008-09-22 04:57 though for now we can just re-append everything after the add point 2008-09-22 04:57 easy 2008-09-22 04:57 will serve for some time 2008-09-22 05:07 I know, I'll add some asserts on leaf full 2008-09-22 05:07 though of course it will never happen ;) 2008-09-22 05:18 ok, all properly and anally asserted 2008-09-22 05:18 now just about time to write dwalk_mock 2008-09-22 05:18 going to be the cute+funny subject of the lkml post 2008-09-22 05:19 hmm, maybe time for a checkin 2008-09-22 05:24 final score: 9 lines of the dwalk_pack prototype survived, 4 were changed, 12 were added not counting comments and asserts 2008-09-22 05:27 ok, dwalk_mock 2008-09-22 05:32 http://www.linuxtoday.com/infrastructure/2008092200135OSCY <- LOL 2008-09-22 05:32 really 2008-09-22 05:33 "Microsoft isn't the answer. Microsoft is the question. 'Linux' or 'No' is the answer." -- some wag 2008-09-22 05:33 where is everybody? 2008-09-22 05:34 it's only 5 in the morning 2008-09-22 05:34 wimps 2008-09-22 06:10 ACTION is alive 2008-09-22 07:09 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-22 07:14 ok, dwalk_mock is a fait accompli 2008-09-22 07:15 almost within striking zone of putting this together in inode.c 2008-09-22 07:15 need to think about extenty implications now 2008-09-22 07:15 extents just about here :) 2008-09-22 07:15 => let the benchmark wars begin 2008-09-22 07:35 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-22 07:36 -!- ceatinge(~ceatinge@veryclever.net) has joined #tux3 2008-09-22 09:41 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-22 09:50 -!- nataliep_(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-22 10:10 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-22 10:15 ok, prolly about time to try dropping these gizmos into inode.c 2008-09-22 10:15 and see if they make extents happen 2008-09-22 10:16 I wonder if we need extent.c 2008-09-22 11:24 we got filemap.c instead 2008-09-22 11:24 extents are a detail 2008-09-22 12:30 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-22 12:44 -!- nataliep_(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-22 13:29 hey flipz 2008-09-22 13:30 hi 2008-09-22 13:30 got extents working or something like that ? 2008-09-22 13:37 check out the code and try it 2008-09-22 14:18 I'll do so a bit later. I've been reading a bit of the code online 2008-09-22 14:18 flipz: you know there's a #linuxfs channel, right ? 2008-09-22 14:24 with clueful stuff happening? 2008-09-22 15:39 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-22 15:40 -!- openblast(~quassel@static.230.173.47.78.clients.your-server.de) has left #tux3 2008-09-22 16:47 -!- ajonat(~ajonat@190.48.103.186) has joined #tux3 2008-09-22 17:29 ok, here we go, final push for an extents prototype 2008-09-22 20:41 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-22 21:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-22 21:39 -!- Bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-09-22 21:50 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-23 01:55 folks 2008-09-23 02:41 -!- tim_dimm_(~mobile@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-23 02:42 Greetings 2008-09-23 03:17 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-23 04:49 -!- kbingham(~kbingham@92.10.168.77) has joined #tux3 2008-09-23 05:43 -!- Bobby(~Bobby@nat-inn.mentorg.com) has joined #tux3 2008-09-23 05:44 hellooo 2008-09-23 05:44 anyone here mind explaining extents to me??? 2008-09-23 05:45 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-23 06:14 -!- smitht(~chatzilla@ool-182f94db.dyn.optonline.net) has joined #tux3 2008-09-23 06:16 Hello all, I am eager to look over the kernel port of Tux3, however I cannot get git to clone the repo at http://phunq.net/ddtree, has anyone done this successfully? Thanks. Trevor 2008-09-23 09:50 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-23 10:45 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-23 10:48 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-23 11:58 "Results 1 - 10 of about 413,000 for tux3" 2008-09-23 12:12 maze, there? 2008-09-23 12:12 yes ;-( 2008-09-23 12:12 your suggestion re putting the xattrs up high in files 2008-09-23 12:13 hmm 2008-09-23 12:13 really a good idea, except for the problem of deepening the radix tree 2008-09-23 12:13 I'd think that would be relativelly insiginificant 2008-09-23 12:13 every file with xattrs would have a 6 level radix tree 2008-09-23 12:13 its significant I think 2008-09-23 12:14 compared to 1 level for most files now 2008-09-23 12:14 however 2008-09-23 12:14 if the radix tree code were to be modified to have one level of btree at the top level... 2008-09-23 12:14 perhaps optionally 2008-09-23 12:14 then it gets practical 2008-09-23 12:14 I'm not quite sure why it would get so deep so quickly? 2008-09-23 12:15 its a radix tree 2008-09-23 12:15 if you map something at the top of the space, the entire tree has to deepen 2008-09-23 12:15 you could use signed 2008-09-23 12:15 ah 2008-09-23 12:15 but 2008-09-23 12:15 well, with an offset 2008-09-23 12:16 so the only cost is to look up and maintain that offset 2008-09-23 12:16 yes, that improves it without much stress 2008-09-23 12:16 good idea 2008-09-23 12:16 I guess, my problem is, I'm not quite sure what a radix tree is 2008-09-23 12:16 well... you can take a run at your first core kernel hack then, after we have tux3's requirements to justify it 2008-09-23 12:17 ah 2008-09-23 12:17 better clear that up 2008-09-23 12:17 I'd assumed you'd be using a normal number of leaves determines depth type of tree here 2008-09-23 12:17 fundamental tool of software engineering 2008-09-23 12:17 is a radix tree what the cpu uses for tlb? 2008-09-23 12:17 radix tree is to btree as bucket sort is to quicksort 2008-09-23 12:17 sorta 2008-09-23 12:17 radix tree is directly indexed at each level instead of binsearched 2008-09-23 12:18 therefore needs no keys 2008-09-23 12:18 index nodes are twice as compact 2008-09-23 12:18 right, so isn't the cpu virt to phys mapping a radix tree? 2008-09-23 12:18 probe is much faster 2008-09-23 12:18 but you still pay a l1 cache pressure penalty for each level of the tree 2008-09-23 12:18 it is, and that is a problem in some cases 2008-09-23 12:18 64 bit machines struggle with it 2008-09-23 12:19 ok, so I just wan't aware of the name 'radix tree' 2008-09-23 12:19 no other name for it that I know 2008-09-23 12:19 like radix sort 2008-09-23 12:19 wasn't aware of any name ;-) 2008-09-23 12:19 you were looking for some reason to split something into two for 32-bit machines 2008-09-23 12:19 can't remember what it was... something about 1EB of space? 2008-09-23 12:20 was that for total fs? 2008-09-23 12:20 hmm, must have been 2008-09-23 12:20 I wasn't either until "wind" showed up on #kernelnewbies with his plan of changing the page cache from a hash to something better 2008-09-23 12:20 I recall he considered half a dozen types of trees I'd never heard of 2008-09-23 12:20 he? 2008-09-23 12:20 the one thing he was sure of, all of them would be better than a hash 2008-09-23 12:20 he was right 2008-09-23 12:20 John Levon 2008-09-23 12:20 h 2008-09-23 12:21 ah 2008-09-23 12:21 after that, showed very little interest in Linux 2008-09-23 12:21 not sure why 2008-09-23 12:21 heading out with team for Lunch. Will be back in 30 minutes. 2008-09-23 12:21 bye 2008-09-23 12:21 ;-) 2008-09-23 12:48 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-23 12:51 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-23 12:55 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-23 13:12 back 2008-09-23 13:23 crawl-8.cuill.com ;-) 2008-09-23 13:23 that was fast 2008-09-23 13:52 folks 2008-09-23 14:10 flipz: talking about implementing xattrs ? 2008-09-23 14:10 already ahve 2008-09-23 14:10 you should check out the code 2008-09-23 14:19 yeah, been looking at it 2008-09-23 14:19 and thinking about the allocation map problem a bit 2008-09-23 14:20 not sure what kind of tree structure to use to represent areas on the disk that might have a certain amount of free blocks 2008-09-23 14:27 it's a bitmap 2008-09-23 14:27 not a tree 2008-09-23 14:27 there will be accelerator bits in the pointers to bitmap blocks eventually 2008-09-23 14:27 to know which blocks have how much space free 2008-09-23 14:27 for now it's just a big linear block map 2008-09-23 14:34 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-23 15:22 big linear scan then ? 2008-09-23 15:37 for now 2008-09-23 15:37 it certainly won't stay that way 2008-09-23 15:37 even now 2008-09-23 15:37 the scan is directed 2008-09-23 15:37 to a preferred target area 2008-09-23 18:11 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-23 18:35 -!- ajonat(~ajonat@190.48.115.242) has joined #tux3 2008-09-23 19:16 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-23 19:52 -!- RalucaM(~ral@scout-9.cnds.jhu.edu) has joined #tux3 2008-09-23 19:56 ACTION is working for a deadline for tomorrow so it will not be present at today's lesson :-( 2008-09-23 20:00 :-( 2008-09-23 20:08 -!- ajonat(~ajonat@190.48.119.175) has joined #tux3 2008-09-23 20:22 -!- Kirantpatil(~kiran@122.167.199.68) has joined #tux3 2008-09-23 20:22 -!- Kirantpatil(~kiran@122.167.199.68) has left #tux3 2008-09-23 20:29 no session at all today? 2008-09-23 20:29 teacher skipped class 2008-09-23 20:30 looks like it 2008-09-23 20:37 -!- macan(~chatzilla@159.226.41.129) has joined #tux3 2008-09-23 22:40 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-23 22:47 -!- tim_dimm_(~mobile@166.134.66.229) has joined #tux3 2008-09-23 23:15 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-23 23:26 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-23 23:35 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-23 23:38 -!- ajonat(~ajonat@190.48.127.55) has joined #tux3 2008-09-23 23:46 flake 2008-09-23 23:46 flipz: = flake 2008-09-23 23:59 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-24 00:10 -!- shapor kicked bh ("be nice") 2008-09-24 00:25 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-24 00:25 hello 2008-09-24 00:45 hi pranith 2008-09-24 00:46 sleep time... 2008-09-24 01:17 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-24 01:26 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-24 01:32 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-24 01:52 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-24 01:58 hello 2008-09-24 01:58 anyone here help me on setting up a fuse tux3? 2008-09-24 01:59 MaZe: ? 2008-09-24 01:59 hmm 2008-09-24 02:00 ? 2008-09-24 02:00 oh, tux3 fuse... I'm probably not the best person, seeing as I haven't tried running it yet 2008-09-24 02:00 hmm, ok 2008-09-24 02:00 I'm mucking around with the kernel and haven't compiled tux3 yet even... still working on (a) learning about the kernel, (b) writing options parsing, and (c) getting a solid debug environment, and most importantly... I have work ;-( 2008-09-24 02:01 oh.. 2008-09-24 02:01 speaking of options parsing... 2008-09-24 02:01 flipz: -o mkfs ofcourse, but can also potentially do -o resize=#blocks 2008-09-24 02:01 hmm 2008-09-24 02:01 pranith: sure 2008-09-24 02:02 i can help 2008-09-24 02:02 hey shapor 2008-09-24 02:02 cool 2008-09-24 02:02 flipz: and support that as a -o remount option as well 2008-09-24 02:02 MaZe: which code are u working on? 2008-09-24 02:02 will have to look see how remount is implemented 2008-09-24 02:03 right now? generic kernel space options parser 2008-09-24 02:03 shapor: what do i need to do? 2008-09-24 02:03 well tux3fs since ls works with it 2008-09-24 02:03 readdir doesn't work in tux3fuse 2008-09-24 02:03 MaZe: ok 2008-09-24 02:03 well, at least not for me, i was going to try a new version of fuse 2008-09-24 02:03 it seems to be returning some data but somethings not right 2008-09-24 02:03 shapor: hmm. where can i get the code? and any install options? 2008-09-24 02:04 tux3fuse is the low level api one 2008-09-24 02:04 its in the repo 2008-09-24 02:04 user/test 2008-09-24 02:04 right next to all the other files 2008-09-24 02:04 ok 2008-09-24 02:05 shapor: can i have the link please... 2008-09-24 02:05 :) 2008-09-24 02:05 its in the mercurial repository at http://tux3.org/tux3 2008-09-24 02:06 hg clone http://tux3.org/tux3 2008-09-24 02:06 then it will be in tux3/user/test 2008-09-24 02:06 just run "make makefs" 2008-09-24 02:07 and "make debug" 2008-09-24 02:07 and it should be mounted on /tmp/test 2008-09-24 02:07 ok 2008-09-24 02:44 anyone mind explaining something about extents to me ? :) 2008-09-24 02:45 i want to know how they help in addressing a larger disk area 2008-09-24 02:45 shapor: ? 2008-09-24 02:45 MaZe: ? 2008-09-24 02:45 hmm? 2008-09-24 02:45 any idea on extents? 2008-09-24 02:45 they don't really... they're really just a performance optimization 2008-09-24 02:45 instead of splitting a file into blocks 2008-09-24 02:46 and then storing the location of every single block 2008-09-24 02:46 ok, u map a chunk of block to a single extent... 2008-09-24 02:46 you split the file into linear sequence of blocks (linear in the sense they are ordered sequentially one after the other on disk) 2008-09-24 02:46 this way you only need to store a mapping (file blocks N..M) -> (disk block X..Y) 2008-09-24 02:47 hmm, ok 2008-09-24 02:47 or indeed just a map of [N] -> [X] is enough (since M and Y are -1 of the next set) 2008-09-24 02:47 thus you have a file as a a set of extents (linear set of blocks), instead of as a set of blocks 2008-09-24 02:47 hmm, nice 2008-09-24 02:48 since you want files to be linear as much as possible (and thus contain few extents) [hence running defragmentors, etc in windows] 2008-09-24 02:48 you will usually end up with relatively few extents, and thus it takes less space to store and can have better performance (especially if well implemented) than just a naive block list 2008-09-24 02:50 basically you get both space savings on disk (and in memory), and better performance, due to having/needing to read in fewer disk blocks (which have a tendency to get pretty randomly distributed) than in a block based fs 2008-09-24 02:51 hmm, thats the main advantage then 2008-09-24 02:51 not addressing a large disk area 2008-09-24 02:56 shapor: problem with fuse :( 2008-09-24 02:56 pranith: whats the problem 2008-09-24 02:57 permission denied 2008-09-24 02:57 doing what 2008-09-24 02:57 cd /tmp/test 2008-09-24 02:58 there are no permissions on the test directory 2008-09-24 02:58 it just say 'd?????' on a ls -l 2008-09-24 02:58 btw, make debug did not return 2008-09-24 02:59 to the command prompt 2008-09-24 03:02 oh i've seen that before 2008-09-24 03:02 hmm 2008-09-24 03:02 are you running ls as root? 2008-09-24 03:02 what do i do? 2008-09-24 03:02 nope 2008-09-24 03:02 try that 2008-09-24 03:02 as a normal user 2008-09-24 03:02 ok 2008-09-24 03:02 try root 2008-09-24 03:03 ohk, got the permissions now as root 2008-09-24 03:03 but... why? 2008-09-24 03:03 because the fuse implementation is *very* rough around the edges 2008-09-24 03:03 shouldn't fuse be accessible as a normal user 2008-09-24 03:04 yes, patches welcome :) 2008-09-24 03:04 :) 2008-09-24 03:04 if you hit ctrl-c 2008-09-24 03:04 and re run it as a normal user 2008-09-24 03:04 ./tux3fs /tmp/testdev /tmp/test -f 2008-09-24 03:04 instead of make debug 2008-09-24 03:04 i think it will work 2008-09-24 03:04 hmm 2008-09-24 03:05 i dint run make debug as root... 2008-09-24 03:05 but it has a sudo command in it 2008-09-24 03:05 oh 2008-09-24 03:05 :) 2008-09-24 03:05 the fuse implementation is really just meant as a test harness 2008-09-24 03:05 we know there are a lot of bugs in it 2008-09-24 03:05 ok 2008-09-24 03:06 i was trying to figure out why readdir isn't working right in tux3fuse (the low level one) 2008-09-24 03:06 oh 2008-09-24 03:06 since i think porting to that will make the kernel port a little easier 2008-09-24 03:06 since the api is more vfs-ish 2008-09-24 03:06 remains to be seen, i havne't had a lot of time to work on it recently 2008-09-24 03:07 you part time work on tux3?? 2008-09-24 03:08 wow! tux3 is gplv3!! 2008-09-24 03:08 how do u get it into the kernel? 2008-09-24 03:15 i think it will be v2 in kernel 2008-09-24 03:15 i work nights and weekends when i have time 2008-09-24 03:16 its not my day job ;) 2008-09-24 03:16 its not anyones day job afaik 2008-09-24 03:17 hmm, what does flips do? 2008-09-24 03:18 i mean what does he do for a day job :D 2008-09-24 03:18 i know he develops tux3... 2008-09-24 03:21 i think hes been mostly working on tux3 recently 2008-09-24 03:33 pranith: regarding gplv2 vs 3. This is solved by the remark 2008-09-24 03:33 * By contributing changes to this file you grant the original copyright holder 2008-09-24 03:33 * the right to distribute those changes under any license. 2008-09-24 03:41 :) 2008-09-24 03:41 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-24 03:41 its just what reiser did :D 2008-09-24 03:41 hope flips doesnt flip out like reiser :P 2008-09-24 03:52 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-24 04:37 what is tux3fs.c and tux3fuse.c? 2008-09-24 04:37 any difference between the two? 2008-09-24 04:41 tux3fuse uses the low level fuse api 2008-09-24 04:41 and readdir is currently broken 2008-09-24 04:49 yeah 2008-09-24 04:49 ok 2008-09-24 04:56 any idea on how i can debug this code?? 2008-09-24 04:56 printf's are nice.. but gdb rocks 2008-09-24 04:56 :D 2008-09-24 05:03 what does printf("'%.*s'", namelen, name) do? 2008-09-24 05:03 im not sure of this printf specifier :( 2008-09-24 05:58 hello 2008-09-24 05:59 anybody here? 2008-09-24 07:22 pranith: A field width or precision, or both, may be indicated by an asterisk `*' or an asterisk followed by one or more decimal digits and a `$' instead of a digit string. In this case, an int argument supplies the field width or precision. A negative field width is treated as a left adjustment flag followed by a positive field width; a negative precision is treated as though it were missing. If a single format directi 2008-09-24 07:22 (from man 3 printf) 2008-09-24 07:23 (on mac :P) 2008-09-24 07:23 RzM|Away: thanks :) 2008-09-24 07:23 RzM|Away: mac is lame 2008-09-24 07:23 :P 2008-09-24 07:23 so the namelen will tell how much of the name to show :D 2008-09-24 07:23 hmm, roger that 2008-09-24 07:24 The field width 2008-09-24 07:24 An optional decimal digit string (with nonzero first digit) specifying a minimum field width. If the converted value has fewer characters than the field width, it will be padded with spaces on the left (or right, if the left-adjustment flag has been given). Instead of a decimal digit string one may write `*' or `*m$' (for some decimal integer m) to specify that the field width is given in the next argument, or in the 2008-09-24 07:24 from linux 2008-09-24 07:24 I use both ;-) 2008-09-24 07:25 and like both :P 2008-09-24 07:25 hmm 2008-09-24 07:25 i use ubuntu dressed up as mac 2008-09-24 07:25 so i have the best of both worlds 2008-09-24 07:25 :D 2008-09-24 07:25 ;-) 2008-09-24 07:26 any idea why readdir fails in tux3fuse? 2008-09-24 07:26 check this out http://xkcd.com/424/ 2008-09-24 07:26 I didn't have a chance to try that :( 2008-09-24 07:27 got to go 2008-09-24 07:27 hmm, xkcd.. lol 2008-09-24 07:27 have fun 2008-09-24 07:27 u too 2008-09-24 07:27 bbye 2008-09-24 07:57 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-24 08:10 -!- kbingham(~kbingham@92.22.1.228) has joined #tux3 2008-09-24 08:10 flipz: -o mkfs ofcourse, but can also potentially do -o resize=#blocks <- ah yes 2008-09-24 08:20 MaZe, and for that matter, -o remount,resize=#blocks 2008-09-24 08:21 maze, see super_operations->remount_fs 2008-09-24 08:21 there we go, that will have to do for a tux3 U session this time 2008-09-24 08:45 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-24 08:48 tim_dimm, dcc chat doesn't work with my connection 2008-09-24 08:48 probably have to configure my router or something 2008-09-24 08:49 I'm timing out on the other connection 2008-09-24 08:50 that's because speakeasy went down 2008-09-24 08:50 just reconnect 2008-09-24 08:50 can u do a 11am call friday ? 2008-09-24 08:50 ACTION points at the query chat 2008-09-24 09:45 -!- pgquiles__(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-24 10:08 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-24 10:51 -!- ceatinge(~ceatinge@veryclever.net) has left #tux3 2008-09-24 11:08 -!- ceatinge(~ceatinge@72.232.13.50) has joined #tux3 2008-09-24 11:32 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-24 11:40 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-24 11:40 heya 2008-09-24 11:40 hi 2008-09-24 11:40 anyone here? 2008-09-24 11:40 hey flips 2008-09-24 11:40 nope 2008-09-24 11:41 was going through the fuse code today... 2008-09-24 11:41 whats wrong with the readdir function? 2008-09-24 11:41 I haven't looked at it 2008-09-24 11:41 shapor started to look at it 2008-09-24 11:41 getting it working in tux3fs was tricky 2008-09-24 11:41 ohk 2008-09-24 11:41 hmm 2008-09-24 11:42 the readdir internal interface is super crappy 2008-09-24 11:42 tux3fuse.c and tux3fs.c are different... 2008-09-24 11:42 on of the worst interfaces anywhere, for anything 2008-09-24 11:42 that's right 2008-09-24 11:42 is it because fuse uses the fuse api? 2008-09-24 11:42 or something like that? 2008-09-24 11:42 no idea 2008-09-24 11:42 hmm, ok 2008-09-24 11:43 you might try emailing tero 2008-09-24 11:43 who checked in the original tux3fuse patch 2008-09-24 11:43 hmm, ok 2008-09-24 11:43 cc tux3 list if you do please 2008-09-24 11:43 yeah, sure 2008-09-24 11:43 :) 2008-09-24 11:43 you can also bug shapor 2008-09-24 11:43 if you like 2008-09-24 11:43 shapor is fun to bug 2008-09-24 11:44 hehe 2008-09-24 11:44 shapor is busy with his day job 2008-09-24 11:44 he told me 2008-09-24 11:44 so better to try tero for that :) 2008-09-24 11:44 "busy" 2008-09-24 11:44 it's all relative 2008-09-24 11:45 hmm 2008-09-24 11:45 can i have the encrypted email id of tero? 2008-09-24 11:47 ok, got it 2008-09-24 12:08 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-24 12:17 debugging of filemap extents finally begins 2008-09-24 12:17 was hard code to write 2008-09-24 12:18 I found it hard 2008-09-24 12:18 prolly easy for shapor though ;) 2008-09-24 12:36 irob's thoughtful initializing of buffers to "deadly data" has the unintended side effect of preventing valgrind from detecting access to unitialized buffer data 2008-09-24 12:37 ACTION removes 2008-09-24 12:37 ah, there we go, lots of valgrind complaints 2008-09-24 12:38 -!- ceatinge(~ceatinge@72.232.13.50) has joined #tux3 2008-09-24 13:42 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-24 14:26 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-09-24 14:26 folks :) 2008-09-24 14:57 -!- pgquiles_(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-24 15:46 hmm, linux-fsdevel server is slower than molasses in January 2008-09-24 15:47 last time I post to it without ccing lkml, I think 2008-09-24 15:47 flips: sk8 30 for new dads 2008-09-24 15:47 still there? 2008-09-24 15:47 rolling out now 2008-09-24 15:47 I'll head out in about 15 2008-09-24 15:47 meet at the pier? 2008-09-24 15:48 got to home by 5 2008-09-24 15:48 pier-ish 2008-09-24 15:48 should work 2008-09-24 15:48 k 2008-09-24 15:48 rollin' 2008-09-24 15:48 cu 2008-09-24 15:48 no crashes 2008-09-24 15:48 certainly no blue screen of death 2008-09-24 15:49 ;-) 2008-09-24 15:49 don't skate in the dark 2008-09-24 15:49 u haven;'t lived until you bomb latigo in the moonlight 2008-09-24 15:50 hmm, sounds like "haven't died" 2008-09-24 15:51 hey flips 2008-09-24 15:51 hey 2008-09-24 15:51 got to go 2008-09-24 15:51 folks were complaining last night about you being delinquent 2008-09-24 15:51 and now you're abandoning us 2008-09-24 15:55 bh, complaining about what you get for free is not normally considered good style 2008-09-24 15:55 besides, I haven't seen you at one of the sessions 2008-09-24 15:55 I'm also joking if you haven't figured that out by now 2008-09-24 15:56 the internet doesn't see the smile 2008-09-24 15:56 never does 2008-09-24 15:56 you say it like you're never going to see the sun rise again or something like that 2008-09-24 15:57 nothing in santa monica is *that* awful 2008-09-24 16:18 -!- mingming_(~mingming@bi01p1.co.us.ibm.com) has joined #tux3 2008-09-24 16:24 ACTION waves to flips 2008-09-24 16:33 flips: ping me when you get back 2008-09-24 16:34 shapor: the core code is in dleaf.c right ? 2008-09-24 17:36 aw, missed mingming 2008-09-24 17:40 bh, ping 2008-09-24 17:41 the core code is indeed in dleaf.c 2008-09-24 17:56 hey shapor, I'm forced to link my ugly tux3 page and cr*ppy design doc from the lkml post because your version of the design doesn't have headings in most of it 2008-09-24 17:57 flips: yeah, looking over it now 2008-09-24 17:57 er was about an hour ago 2008-09-24 17:58 it's starting to get interesting now 2008-09-24 17:58 with the dwalk stuff 2008-09-24 17:58 yeah, it's going to get more and more complicated as well 2008-09-24 17:58 this step is making it less complicated 2008-09-24 17:58 general resizing isn't implemented yet which includes truncation 2008-09-24 17:59 a couple of big ugly functions will disappear in a week or so 2008-09-24 17:59 all of that needs to be integrated into atomic logging as well, not really that easy 2008-09-24 17:59 roughly zero impact on dleaf 2008-09-24 17:59 talking about the refactoring ? 2008-09-24 18:00 logging impact 2008-09-24 18:00 logging just needs to be done once for all forms of btree, its generic 2008-09-24 18:00 president bush is giving a speech btw 2008-09-24 18:00 hope he chokes 2008-09-24 18:01 he isn't your favorite president of all time ? 2008-09-24 18:01 least in fact 2008-09-24 18:01 can't think of a worse one 2008-09-24 18:01 well, you should start putting in blank functions for atomic logging 2008-09-24 18:01 anyway 2008-09-24 18:01 #offtopic 2008-09-24 18:02 real functions for atomic logging will go in around the time of the kernel port 2008-09-24 18:02 so no need to fire blanks 2008-09-24 18:03 the blanks are helpful for other folks 2008-09-24 18:03 got to get back to my post 2008-09-24 18:03 nearly running out of time 2008-09-24 18:03 and you can start training people to think in terms of it 2008-09-24 18:03 if somebody steps up to implement it I'll put in some stubs 2008-09-24 18:03 otherwise... 2008-09-24 18:04 got other things to do 2008-09-24 18:04 like train up some hackers for the port 2008-09-24 18:05 you should mark it so that folks understand your thinking regarding it 2008-09-24 18:05 I'm tell'n you that it's going to be useful for me and will indicate a direction with your implementation 2008-09-24 18:07 it's been described in a number of posts 2008-09-24 18:08 we can get links up on the page 2008-09-24 18:27 that would be good 2008-09-24 18:54 -!- tim_dimm_(~mobile@166.135.68.85) has joined #tux3 2008-09-24 19:22 "Tux3 gets a high speed atom smasher" -- just posted to lkml 2008-09-24 20:10 -!- Kirantpatil(~kiran@122.167.206.163) has joined #tux3 2008-09-24 20:21 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-24 20:21 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-24 21:37 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-24 21:38 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-24 22:16 -!- Kirantpatil(~kiran@122.167.197.205) has joined #tux3 2008-09-24 22:16 hello list.. 2008-09-24 22:16 i tried to get junkfs 2008-09-24 22:16 but link http://m.a.z.e.pl/junkfs.tar.gz is not working.. 2008-09-24 22:17 any one point me to the right location.. 2008-09-24 22:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-24 22:37 hello.. 2008-09-24 22:38 howdy 2008-09-24 22:45 could you please get me the link from where i can download junkfs ? 2008-09-24 22:45 as http://m.a.z.e.pl/junkfs.tar.gz is not working.. 2008-09-25 00:10 that was fun 2008-09-25 00:11 beach cruiser on the strand at midnight 2008-09-25 00:11 can't get more california than that 2008-09-25 00:49 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-25 01:04 -!- ajonat(~ajonat@190.48.120.246) has joined #tux3 2008-09-25 01:07 bah 2008-09-25 01:18 flipsout: enjoy it before winter hits 2008-09-25 01:53 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-25 03:19 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-25 04:19 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-25 05:00 -!- pgquiles_(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-25 05:08 -!- Kirantpatil(~kiran@122.167.196.127) has joined #tux3 2008-09-25 05:09 -!- Kirantpatil(~kiran@122.167.196.127) has left #tux3 2008-09-25 11:09 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-25 11:11 file_bwrite: block write <0:0> 2008-09-25 11:11 <<< extent 0x0/4 >>> 2008-09-25 11:11 0 entry groups: 2008-09-25 11:11 file_bwrite: fill gap at 0x0/4 2008-09-25 11:11 balloc_extent_from_range: balloc 4 blocks from [0/1000] 2008-09-25 11:11 balloc extent -> [2/4] 2008-09-25 11:12 extent writing almost happening 2008-09-25 11:12 lots of combinatorics to take care of 2008-09-25 11:53 segs: 0x2/4 (1) 2008-09-25 11:53 dwalk_mock: add entry key 0x4 after 0x0 2008-09-25 11:53 dwalk_mock: add extent 0x2/4 2008-09-25 11:53 getting closer... 2008-09-25 11:57 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-09-25 12:02 -!- amey(~amey@116.73.35.180) has left #tux3 2008-09-25 12:33 -!- tim_dimm_(~mobile@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-25 12:57 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-25 14:07 fyi, I might be late for tux3 U.. hopefully not... 2008-09-25 14:23 bah 2008-09-25 15:23 -!- Ryback_(~ulisses@201.82.39.16) has joined #tux3 2008-09-25 17:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-25 18:49 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-25 18:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-25 19:14 -!- Kirantpatil(~kiran@122.167.212.31) has joined #tux3 2008-09-25 19:14 -!- Kirantpatil(~kiran@122.167.212.31) has left #tux3 2008-09-25 19:33 -!- ajonat(~ajonat@190.48.120.246) has joined #tux3 2008-09-25 19:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-25 19:57 I'll miss tonight's Tux3 U 2008-09-25 19:57 looking forward to reading the logs 2008-09-25 20:02 present 2008-09-25 20:02 although were's everyone else 2008-09-25 20:02 i'm here 2008-09-25 20:03 just got back from an offsite 2008-09-25 20:03 whiskey, steak, and guns ;-) 2008-09-25 20:04 okay, so I chose wine instead of whiskey 2008-09-25 20:04 so that makes it 2008-09-25 20:04 wine, steak and guns ... 2008-09-25 20:05 and I might be getting the order a little mixed up since I'm a little tini-iny-bit buzzed 2008-09-25 20:05 so it might just have been 2008-09-25 20:05 guns, steak, and wine... 2008-09-25 20:05 hmm 2008-09-25 20:05 where's the teacher? 2008-09-25 20:09 heh, tonight you get to teach instead 2008-09-25 20:10 nope 2008-09-25 20:10 I'm close to calling myself drunk... I need a nap. 2008-09-25 20:11 flipsout: ping pong? 2008-09-25 20:12 I believe Razvan was supposed to be the teacher this time around 2008-09-25 20:12 but he's RzM|Away, apparently afk 2008-09-25 20:13 maze 2008-09-25 20:13 here 2008-09-25 20:13 ok 2008-09-25 20:13 cool 2008-09-25 20:13 where were we 2008-09-25 20:13 any requests? 2008-09-25 20:13 last one was bio xfrs 2008-09-25 20:13 then last tuesday we skipped 2008-09-25 20:13 right, conducted by maze 2008-09-25 20:13 it was good 2008-09-25 20:14 now tux3fs has a rather nice generic set of bio fns 2008-09-25 20:14 an async and a sync bio transfer flavor 2008-09-25 20:14 right 2008-09-25 20:14 fully general, except maybe it could take some alloc flags 2008-09-25 20:14 alloc flags? 2008-09-25 20:14 yes, like how hard the kernel should try to satisfy a request 2008-09-25 20:15 you will see that functions like kmalloc take gfp flags 2008-09-25 20:15 memory wise or io wise? 2008-09-25 20:15 "gfp: get free pages" 2008-09-25 20:15 memory wise 2008-09-25 20:15 well 2008-09-25 20:15 is coupled to io 2008-09-25 20:15 in an incestuous way 2008-09-25 20:15 most of the time, the kernel cache will be just about full 2008-09-25 20:16 what we have now I believe asks for memory in a 'can sleep' way 2008-09-25 20:16 except for right after boot, or after unmounting a volume, say, which invalidates a bunch of cache 2008-09-25 20:16 unless you specify GFP_ATOMIC, it is always "can sleep" 2008-09-25 20:16 also when you delete a dvd you just checksummed ;-) 2008-09-25 20:16 so that io transfers can take place and other things can run while waiting for memory to get free 2008-09-25 20:17 we have __NOFAIL as a gfp flag 2008-09-25 20:17 just means it will try for infinity... 2008-09-25 20:17 it means, under no circumstances return without completing the allocation 2008-09-25 20:17 until it suceeds 2008-09-25 20:18 yes 2008-09-25 20:18 and what could prevent it from succeeding? 2008-09-25 20:18 asking for 100M on 50M machine 2008-09-25 20:18 true 2008-09-25 20:18 or 120M machine with 20+M already allocated 2008-09-25 20:18 or not enough memory of a specific type 2008-09-25 20:18 or on a 200M machine on which 195M has leaked 2008-09-25 20:19 (ie. asking for low memory, when only high mem is free) 2008-09-25 20:19 also true 2008-09-25 20:19 [or dma16 or dma32] 2008-09-25 20:19 but the most common reason is: when memory is full of dirty pages that cannot be written out for some reason 2008-09-25 20:19 in general it is a bug 2008-09-25 20:20 in general, memory can always be allocated in kernel, by kicking out some cache 2008-09-25 20:20 so writing out dirty pages should not need to allocate memory, since otherwise it can deadlock? 2008-09-25 20:20 or you need to have a pre-allocated pool of temporary pages 2008-09-25 20:20 I believe the kernel even provides such features 2008-09-25 20:21 exactly 2008-09-25 20:21 you nailed that 2008-09-25 20:21 in fact, this is an unsolved problem in linux kernel 2008-09-25 20:21 or it is solved, but the fix is not in mainline 2008-09-25 20:21 see bio-throttle 2008-09-25 20:22 there has been an attempt to fix the problem by limiting total memory that is allowed to be dirty in kernel 2008-09-25 20:22 "dirty limits" 2008-09-25 20:22 complex, fragile, and doesn't work 2008-09-25 20:22 but has been good for creating lots of bugfixing activity lately 2008-09-25 20:22 anyway 2008-09-25 20:22 enough on memory for now? 2008-09-25 20:23 I think so... 2008-09-25 20:23 this is just something to be aware of off? 2008-09-25 20:23 let's get back to __copy2 2008-09-25 20:23 get you thinking about it, yes 2008-09-25 20:23 and some practical facts about GFP_ flags to memory allocators 2008-09-25 20:24 when you allocate a bio, there is an attempt made to provide a pre-allocated pool, so in theory a bio alloc will never fail 2008-09-25 20:25 in practice, it can slow to a craw as the pre-allocated pool only gaurantees 2 bios 2008-09-25 20:25 and it often gets into that corner 2008-09-25 20:25 hmm, so should you keep a couple pre-alloced bios for yourself? 2008-09-25 20:26 youcan maintain your own pool, yes 2008-09-25 20:26 perhaps a good idea when the kernel is in the broken state it is 2008-09-25 20:26 extra complexity 2008-09-25 20:26 see the "mempool" mechanism 2008-09-25 20:27 is it worth it? 2008-09-25 20:27 better is to fix the bug 2008-09-25 20:27 it works for some situations 2008-09-25 20:27 its messy 2008-09-25 20:28 much harder to fix bugs, then to work around them ;-) 2008-09-25 20:28 the first requires understanding the entire system 2008-09-25 20:28 the second only the way it affects you 2008-09-25 20:28 true 2008-09-25 20:28 we can return to that issue 2008-09-25 20:29 it is fully understood, but not by everybody 2008-09-25 20:29 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2063 <- _2copy 2008-09-25 20:29 I'm totally mystified by the name, 2copy 2008-09-25 20:29 as we all are 2008-09-25 20:29 speaking of all 2008-09-25 20:29 how many of us are here? 2008-09-25 20:29 I'm feeling lonely 2008-09-25 20:30 [and drunk...] 2008-09-25 20:30 heh 2008-09-25 20:30 we'll keep it light then 2008-09-25 20:30 and short 2008-09-25 20:30 I'm attempting to get feeling drunk ;) 2008-09-25 20:30 you're well ahead it would seem 2008-09-25 20:30 heh 2008-09-25 20:31 yea, 2 glasses of white (some pinot), and 2 of red (not sure what it was), plus steak, plus an afternoon at a gun range 2008-09-25 20:31 the gun range made you drunk I presume 2008-09-25 20:31 wine before or after shooting? 2008-09-25 20:31 nah, that was first, and was fun ;-) 2008-09-25 20:31 the range was first 2008-09-25 20:32 afterward you rode around and shot up a few stop signs? 2008-09-25 20:32 nah, we left the range gun-less 2008-09-25 20:32 just checking 2008-09-25 20:32 we then invaded an italian restaurant in downtown mountain view 2008-09-25 20:33 castro street? 2008-09-25 20:33 yep 2008-09-25 20:34 ok, 2copy 2008-09-25 20:34 right 2008-09-25 20:34 seems we're pretty much alone 2008-09-25 20:35 the basic scheme is: alloc page; ->prepare write; copy data onto it; ->commit_write 2008-09-25 20:35 the -> are calls into the filesystem 2008-09-25 20:36 interesting 2008-09-25 20:36 what's the purpose of the prepare? 2008-09-25 20:36 the channel log will be preserved for posterity 2008-09-25 20:36 verify there's enough disk space, etc? 2008-09-25 20:36 I've alwasy wonder about that 2008-09-25 20:36 yes including the comments about wine 2008-09-25 20:36 for a partial page write, the prepare does a read before write 2008-09-25 20:36 otherwise it seems pretty useless 2008-09-25 20:36 I think it is useless 2008-09-25 20:37 but it has been in linux since eternity, which is an argument for it staying another eternity 2008-09-25 20:37 how do you know if it's a partial or full page write? 2008-09-25 20:37 see the parameters passed to it 2008-09-25 20:37 and where they come from 2008-09-25 20:37 ah 2008-09-25 20:37 this comes from the file pos and write len 2008-09-25 20:38 2159 status = a_ops->prepare_write(file, page, offset, offset+bytes); 2008-09-25 20:38 so there may be a partial page at the beginning and one at the end 2008-09-25 20:38 so I'm assuming that the 3rd and 4th paramts 2008-09-25 20:38 are 0,4096 if we're writing a full page 2008-09-25 20:38 pretty dumb to have the ->prepare on every page when only two per transfer need the special treatment 2008-09-25 20:38 oh, moment 2008-09-25 20:38 3rd 0 2008-09-25 20:38 do we call prepare_write, commit_write per page 2008-09-25 20:38 otherwise right 2008-09-25 20:38 or on page ranges 2008-09-25 20:39 per page 2008-09-25 20:39 dumb 2008-09-25 20:39 actually, this whole part of the kernel sucks pretty hard 2008-09-25 20:39 why just 3rd 0, why not 4th PAGE_SIZE (4096)? 2008-09-25 20:40 4th is normally page_size, yes 2008-09-25 20:40 ok 2008-09-25 20:40 3rd is zero normally because it's an offset 2008-09-25 20:40 in the page 2008-09-25 20:40 right hence the 0,4096 above I was asking about 2008-09-25 20:40 see the flush_dcache page 2008-09-25 20:41 ah 2008-09-25 20:41 sorry, read wrong 2008-09-25 20:41 oki 2008-09-25 20:41 the dcache flush is a noop on x86 2008-09-25 20:41 some arches need it 2008-09-25 20:41 mips I think 2008-09-25 20:41 what's the purpose? 2008-09-25 20:41 tlb hackery? 2008-09-25 20:41 could not swear to that 2008-09-25 20:41 also not really clear to me 2008-09-25 20:41 it's like L1 cache 2008-09-25 20:42 that has to be explicitly flushed 2008-09-25 20:42 why... another matter 2008-09-25 20:42 seems like braindamage to design a processor that doesn't know how to flush its cache 2008-09-25 20:42 but people do it, they have their reasons I suppose 2008-09-25 20:42 maybe the asm code can be much more efficient on some archs if you assume explicit flushes on any change 2008-09-25 20:43 put that one aside to bother the mips maintainer about 2008-09-25 20:43 there is some sparse kernel doc on the subject 2008-09-25 20:43 but there is a general principle here: just because your code works on x86 does not mean it works 2008-09-25 20:44 hmm 2008-09-25 20:44 same is true if all your spinlocks work, because you compiled with smp disabled 2008-09-25 20:44 so how do you test on the dozen+ archs linux supports? 2008-09-25 20:44 get users to report errors? 2008-09-25 20:44 that's the question isn't it? 2008-09-25 20:44 after testing on the 2-3 you have access to? 2008-09-25 20:44 well, I can test smp 32 and 64 bit x86 2008-09-25 20:44 you try to be aware of the issues and write using the generic apis that work on every arch 2008-09-25 20:44 I could probably get my hands on power32 and maybe alpha 2008-09-25 20:45 and eventually, somebody with that arch will hit your bug and complain 2008-09-25 20:45 but that's about it 2008-09-25 20:45 right... 2008-09-25 20:45 but... 2008-09-25 20:45 it's good to test on a couple different arches 2008-09-25 20:45 bugs like that are damn near impossible to trace down 2008-09-25 20:45 big/lttle end 2008-09-25 20:45 is any of the archs the most difficult to program for? 2008-09-25 20:45 and if one can find it, maybe something that has to do explicit dcache flush and other such horrors 2008-09-25 20:45 (I know alpha has the most lenient memory cache coherency model) 2008-09-25 20:45 sparc maybe 2008-09-25 20:46 sparc is pretty horrible 2008-09-25 20:46 who still has sparc machines? 2008-09-25 20:46 pretty much complete absence of atomic instructions 2008-09-25 20:46 dave miller 2008-09-25 20:46 sparc maintainer 2008-09-25 20:46 sun has the niagara box 2008-09-25 20:46 well, hopefully the maintainer does ;-) 2008-09-25 20:46 but it's true, sparc is nearly dead 2008-09-25 20:46 arm 2008-09-25 20:47 arm is embedded 2008-09-25 20:47 on the rise 2008-09-25 20:47 it's a big constituency these days 2008-09-25 20:47 easy to find, hard to find with a lot of ram or power or disk 2008-09-25 20:47 would testing in emulators work? 2008-09-25 20:47 if it has such a great mips/watt ratio you'd expect to see it in hpc 2008-09-25 20:48 some sort of qemu or something? 2008-09-25 20:48 but it's not there 2008-09-25 20:48 makes me wonder 2008-09-25 20:48 about that mips/watt ratio 2008-09-25 20:48 of arm? 2008-09-25 20:48 yes 2008-09-25 20:48 arm is good for stuff which needs high mips 2008-09-25 20:48 but rarely 2008-09-25 20:48 possilby you can test in emulation 2008-09-25 20:48 ie. high peak, but mostly idle 2008-09-25 20:48 I think qemu is x86 only 2008-09-25 20:48 i've been doing a lot of stuff on amd geode and intel atom if that helps, i can test something, they're low end but still powerful 2008-09-25 20:48 but those are x86 aren't tyhey? 2008-09-25 20:49 bushman, they' 2008-09-25 20:49 bushman, they're x86 arch 2008-09-25 20:49 but testing is _always_ useful 2008-09-25 20:49 x86 is a sick arch... but it's so dominant 2008-09-25 20:49 true 2008-09-25 20:49 POS86 2008-09-25 20:49 hmm no idea what that is 2008-09-25 20:50 you'll decode it eventually ;) 2008-09-25 20:50 are we done with _2copy? 2008-09-25 20:50 oh piece of 2008-09-25 20:50 balance_dirty_pages_ratelimited(mapping); <- attempt to limit kernel dirty pages 2008-09-25 20:50 so it goes a page at a time right? 2008-09-25 20:50 nasty thing 2008-09-25 20:50 yes, yuck 2008-09-25 20:51 and even then it's a mess 2008-09-25 20:51 there are many different flavors of similar kinds of io transfer loops 2008-09-25 20:51 in filemap.c 2008-09-25 20:51 take a browse and enjoy some of them 2008-09-25 20:52 oh, look at that vmtruncate at the end 2008-09-25 20:52 scary stuff 2008-09-25 20:52 why is there so much of it? 2008-09-25 20:52 much of what? 2008-09-25 20:52 copy loops? 2008-09-25 20:52 code;-) 2008-09-25 20:52 badly designed 2008-09-25 20:52 or not designed at all 2008-09-25 20:52 just grows 2008-09-25 20:53 changes in response to bug reports 2008-09-25 20:53 it feels like we have multiple interfaces/apis for everything 2008-09-25 20:53 including performance bug reports 2008-09-25 20:53 you're starting to get a feeling for it 2008-09-25 20:53 and eventually none of them get fully tested 2008-09-25 20:53 it's not unmanageable, just unconscionable 2008-09-25 20:53 at least not in all the myriad of combinations 2008-09-25 20:54 they get pretty well tested 2008-09-25 20:54 hmm 2008-09-25 20:54 I _think_ pretty much all buffer wries get funneled through _2copy 2008-09-25 20:54 though I haven't completely read through since this thing landed 2008-09-25 20:54 here's a question then 2008-09-25 20:54 how would I go about tracing a syscall 2008-09-25 20:55 seeing exactly which kernel funcs 2008-09-25 20:55 linux trace toolkit 2008-09-25 20:55 got called in what order with what params? 2008-09-25 20:55 puts probes into the kernel 2008-09-25 20:55 dprobe? kprobe? 2008-09-25 20:55 http://www.opersys.com/LTT/ 2008-09-25 20:55 kprobe 2008-09-25 20:55 now part of ltt I think 2008-09-25 20:55 hmm, so that's the 2nd time you've mentioned ltt 2008-09-25 20:56 it's good I take it? 2008-09-25 20:56 I haven't used it 2008-09-25 20:56 I should 2008-09-25 20:56 but it's the only game in town 2008-09-25 20:56 I think 2008-09-25 20:56 latest news is 2004 2008-09-25 20:56 I think that may because it got at least partially merged 2008-09-25 20:57 http://ltt.polymtl.ca/ 2008-09-25 20:57 moved 2008-09-25 20:58 right 2008-09-25 20:58 it current 2008-09-25 20:58 to 2.6.27-rc7 2008-09-25 20:58 current to yesterday or so ;) 2008-09-25 20:58 I should try it, there are no doubt many times when it could have saved me time 2008-09-25 20:59 patch-2.6.27-rc7-lttng-0.26.tar.bz225-Sep-2008 16:05 177K - right 2008-09-25 21:00 we should have looked at grab_cache_page in _2copy 2008-09-25 21:00 another poorly named function 2008-09-25 21:00 but important, and it will serve as our introduction the the page cache api 2008-09-25 21:00 Find or create a page at the given pagecache position. Return the locked 2038 * page. This function is specifically for buffered writes. 2008-09-25 21:00 one of the worst apis in the kernel ;) 2008-09-25 21:01 2038? 2008-09-25 21:01 line # 2008-09-25 21:01 oh 2008-09-25 21:01 what does page cache position mean? 2008-09-25 21:01 index within the cache for a particular inode 2008-09-25 21:01 so many logical pages offset in the file 2008-09-25 21:01 so a page cache position is a superblock:inode:offset triplet? 2008-09-25 21:02 just inode:offset 2008-09-25 21:02 because inode->sb 2008-09-25 21:02 ah 2008-09-25 21:02 so now inode #, but instead inode ptr 2008-09-25 21:02 yes 2008-09-25 21:02 s/now/not/ 2008-09-25 21:02 the "page cache" is in fact not a single cache 2008-09-25 21:02 maybe it was at one time 2008-09-25 21:03 but now it is a radix tree that hangs off of each inode 2008-09-25 21:03 giving you an idea maybe how bloating things get with lots of small files 2008-09-25 21:03 so page cache is effectively per inode? 2008-09-25 21:03 and what a bad idea sysfs is, which uses files and all the cache stuff that goes with it, to communicate tiny, 4 byte quantities, to the kernel 2008-09-25 21:04 page cache is per inode 2008-09-25 21:04 not effectively, absolutely 2008-09-25 21:04 why this split to per inode level? 2008-09-25 21:05 we think it's a good idea 2008-09-25 21:05 doesn't it make it harder to find what to free when memory runs low? 2008-09-25 21:05 no, because all the pages are linked together via a lru list 2008-09-25 21:05 but anyway 2008-09-25 21:05 lru is probably a bad idea 2008-09-25 21:05 lru = least recently used 2008-09-25 21:05 yes 2008-09-25 21:05 self organizing list 2008-09-25 21:06 simple minded 2008-09-25 21:06 just for the folks reading this later 2008-09-25 21:06 not very effective, especially since we mostly bypass it 2008-09-25 21:06 who bypasses it? 2008-09-25 21:06 we do 2008-09-25 21:06 in writeout for example 2008-09-25 21:06 it's mostly per-inode using the inode dirty lists 2008-09-25 21:07 we, as in tux 2008-09-25 21:07 or we as in fs drivers? 2008-09-25 21:07 there as in linuxen 2008-09-25 21:07 we as in linux penguins 2008-09-25 21:08 hmm 2008-09-25 21:08 the lru has exaclty one purpose: to decide which page to evict next 2008-09-25 21:08 we mess with the lru idea so much that we don't get good decisions on that 2008-09-25 21:09 what do you mean mess? 2008-09-25 21:09 all kinds of mess 2008-09-25 21:09 there is the concept of hot and cold end of the lru list 2008-09-25 21:09 does it evict both clean and dirty page? 2008-09-25 21:10 and there is code to try to move pages to the hot or cold end of the list according to whether we think the page is hot or cold 2008-09-25 21:10 both clean and dirty 2008-09-25 21:10 actually only clean 2008-09-25 21:10 it cleans dirty pages 2008-09-25 21:10 and evicts clean pages 2008-09-25 21:10 yes, to prefer using hot pages, since those are likely to be in cache 2008-09-25 21:10 so we won't be wasting cache by using hot pges 2008-09-25 21:10 right, except I think we mostly blow chunks in deciding what will be hot 2008-09-25 21:11 sinc the caches of hot page, can replace spot of previous page 2008-09-25 21:11 yes, evicting pages that wil be faulted in again immediately does no good, quite the contrary 2008-09-25 21:12 or read via filesystem operations 2008-09-25 21:12 ok, it's getting late 2008-09-25 21:12 true 2008-09-25 21:12 and I'm falling asleep 2008-09-25 21:12 see you 2008-09-25 21:12 ;-) 2008-09-25 21:13 should be time for questions now 2008-09-25 21:13 overtime 2008-09-25 21:13 and since I've been asking questions all session... 2008-09-25 21:13 I'll let other folks ask questions now 2008-09-25 21:13 next tuesday we will continue with grab_cache_page 2008-09-25 21:14 finally 2008-09-25 21:14 ;-) 2008-09-25 21:14 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-25 21:14 Topic for #tux3 is: Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: grab_cache_page and friends 2008-09-25 21:14 -!- flips changed mode/#tux3 -> -o shapor 2008-09-25 21:15 -!- flips changed mode/#tux3 -> -o flips 2008-09-25 21:15 yeah, tried changing that earlier and failed 2008-09-25 21:15 I'll unlock the topic 2008-09-25 21:15 nah 2008-09-25 21:15 anyway, let's find a bed 2008-09-25 21:16 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-25 21:16 Topic for #tux3 is: Tux3 list membership roars past 100! ~ http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: grab_cache_page and friends 2008-09-25 21:16 -!- flips changed mode/#tux3 -> -o flips 2008-09-25 21:16 hmm 2008-09-25 21:16 -!- ChanServ changed mode/#tux3 -> +o flips 2008-09-25 21:17 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: grab_cache_page and friends" 2008-09-25 21:17 -!- flips changed mode/#tux3 -> -o flips 2008-09-25 21:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 01:28 folks 2008-09-26 01:28 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-26 01:28 hello 2008-09-26 01:28 anyone here? 2008-09-26 01:52 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 02:04 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-26 02:04 hello 2008-09-26 02:44 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 03:58 hola 2008-09-26 04:49 anyone here? 2008-09-26 05:17 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-26 05:17 hmm 2008-09-26 05:23 well, i am here, but i will not be able to answer any questions :) 2008-09-26 05:23 and the rest is probably asleep. it's 5 am there or something like that 2008-09-26 05:25 hehe 2008-09-26 05:25 ok 2008-09-26 05:25 hmm 2008-09-26 06:06 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-26 06:06 reading yesterday's logs 2008-09-26 06:06 since no one posted the tux u proceedings... 2008-09-26 06:06 i think im going to do that 2008-09-26 06:23 could you post those from tuesday too? 2008-09-26 06:23 i think they are still missing 2008-09-26 07:13 hmm 2008-09-26 07:13 wait 2008-09-26 08:03 hmmm 2008-09-26 08:49 hola 2008-09-26 08:49 sad to hear the feature drop from tux3 2008-09-26 08:50 was hoping we could blow away zfs 2008-09-26 08:50 that was an awesome idea of dynamically increasing your fs across multiple hds 2008-09-26 08:56 pranith: well, lvm3 will be used for that, hopefully 2008-09-26 08:57 _hopefully_ 2008-09-26 08:57 yeah, the argument for not doing that is reasonable 2008-09-26 09:12 will tux3 be having plugins?? :D 2008-09-26 09:12 we can implement this as a plugin if possible 2008-09-26 09:12 ;) 2008-09-26 09:30 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-26 09:31 ACTION is sorry he missed the class from last night :-( 2008-09-26 09:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 10:20 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-26 10:22 nah, plugins for this are pretty much as implementing it to begin with 2008-09-26 10:22 as hard ;-) 2008-09-26 10:48 -!- Kirantpatil(~kiran@122.167.222.6) has joined #tux3 2008-09-26 10:48 -!- Kirantpatil(~kiran@122.167.222.6) has left #tux3 2008-09-26 10:59 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-26 12:24 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-26 12:42 -!- pgquiles(~pgquiles@42.Red-83-39-60.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 14:38 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 14:53 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 15:21 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 15:35 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 15:36 maze, plugins? 2008-09-26 15:36 hmm? 2008-09-26 15:36 moment 2008-09-26 15:36 MaZe> nah, plugins for this are pretty much as implementing it to begin with 2008-09-26 15:37 earlier on talk of plugins for multi-disk 2008-09-26 15:37 I must have been d/c for that 2008-09-26 15:37 filesystem plugins? 2008-09-26 15:38 something like that 2008-09-26 15:38 implementing support for plugins in tux3 2008-09-26 15:38 and then implementing support for multi-disk as a plugin 2008-09-26 15:38 If I understood correctly 2008-09-26 15:38 I was commenting about 2-3 hours later 2008-09-26 15:38 plugins make me think if reiserfs 2008-09-26 15:39 (08:49:58 AM) pranith: sad to hear the feature drop from tux3 2008-09-26 15:39 (08:50:09 AM) pranith: was hoping we could blow away zfs 2008-09-26 15:39 (08:50:36 AM) pranith: that was an awesome idea of dynamically increasing your fs across multiple hds 2008-09-26 15:39 (08:56:11 AM) RzM|Away left the room (quit: Quit: Computer goes to sleep!). 2008-09-26 15:39 (08:56:47 AM) data: pranith: well, lvm3 will be used for that, hopefully 2008-09-26 15:39 (08:57:04 AM) pranith: _hopefully_ 2008-09-26 15:39 (08:57:31 AM) pranith: yeah, the argument for not doing that is reasonable 2008-09-26 15:39 (09:12:00 AM) pranith: will tux3 be having plugins?? :D 2008-09-26 15:39 (09:12:10 AM) pranith: we can implement this as a plugin if possible 2008-09-26 15:39 if there are plugins, they will certainly not land before initial merge 2008-09-26 15:39 and there will be no aborbing the volume manager into tux3 2008-09-26 15:40 when designing lvm3 we need to figure out some sort of fs-bdev interface which provides more info than currently available 2008-09-26 15:40 what tux3 will do is work more closely with the volume manager 2008-09-26 15:40 which we will also develop 2008-09-26 15:40 maze, did you know you were going to be developing a volume manager? 2008-09-26 15:40 heh 2008-09-26 15:40 serious 2008-09-26 15:40 see comment above 2008-09-26 15:40 which one? 2008-09-26 15:40 the fs driver in order to schedule some stuff 2008-09-26 15:40 needs to know more about bdev disk layout for non disk bdevs 2008-09-26 15:40 ie. raid multi-disk etc 2008-09-26 15:41 exactly 2008-09-26 15:41 the fs is going to be able to specify the volume map 2008-09-26 15:41 and to retrieve it from the lvm 2008-09-26 15:41 very important 2008-09-26 15:41 a driver for this is already sketched out 2008-09-26 15:41 it's called "table block device" 2008-09-26 15:42 and will be a plugin for lvm3 2008-09-26 15:42 which we are going to develop 2008-09-26 15:42 hmm interesting 2008-09-26 15:42 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 15:42 covered stuff like this in my berlin talk 2008-09-26 15:43 and should probably get it on the agenda for some upcoming linux confab 2008-09-26 15:43 though sometimes I feel like I'm showing television to the family dog as far as understanding from other kernel devs goes 2008-09-26 15:43 hopefuly repeated impressions makes a difference 2008-09-26 15:44 that's one reason why we need to make some new kernel devs 2008-09-26 15:44 heh 2008-09-26 15:45 I'm being pulled in 4 directions here 2008-09-26 15:46 bio is pulling strongest I think 2008-09-26 15:46 and not having enough time to devote myself fully to any of these 2008-09-26 15:46 just don't get distracted by mm 2008-09-26 15:46 no, not even talking about in-kernel 2008-09-26 15:46 regard it as an interesting, funny little friend with occasionally curious opinions and you will be ok 2008-09-26 15:47 talking about my team at work, another team (kernel), and a 3rd team in the process of being created (networking), plus pure kernel as fourth 2008-09-26 15:47 fuck real life ;) 2008-09-26 15:48 anyway, the key is not to get the idea you have to understand the whole kernel at once 2008-09-26 15:48 even linus doesn't 2008-09-26 15:48 or akpm, though he probably gets closest 2008-09-26 15:51 of course you don't 2008-09-26 15:51 but it's best to understand as much as possible 2008-09-26 15:52 and preferably at least one level down from where you muck around 2008-09-26 15:52 true 2008-09-26 15:52 if the kernel internal apis were clean and well documented, this wouldn't be that needed 2008-09-26 15:52 with the total lack of docs, it's not possible to write non-buggy code 2008-09-26 15:52 sometimes you are just plain forced to proceed on induction though 2008-09-26 15:52 without knowing what it will actually do 2008-09-26 15:52 it depends on what you're trying to do of course 2008-09-26 15:53 anyway 2008-09-26 15:53 lxr is the answer 2008-09-26 15:53 write bug-less code is easy, if and only if, there is no undocumented code in the layers beneath (and to a lesser extent above) you 2008-09-26 15:53 right, hence I've been reading code as a passtime 2008-09-26 15:53 knowing everything in advance helps you design, but is not strictly necessary for developing or debugging 2008-09-26 15:53 true 2008-09-26 15:54 but browsing code requires less commitment 2008-09-26 15:54 in other words, you can pick up the info on-demand for the latter two 2008-09-26 15:54 and that's something I have time for 2008-09-26 15:54 right 2008-09-26 15:54 also: taking in too much at once can lead to burn out and drop out 2008-09-26 15:55 true 2008-09-26 15:55 what does NPI mean? 2008-09-26 15:55 NFI 2008-09-26 15:55 <- "no fscking idea" 2008-09-26 15:56 heh 2008-09-26 15:56 it's a TLA thought up by a SFI 2008-09-26 16:08 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 16:14 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 16:21 Settlement-Free Interconnect 2008-09-26 16:21 apparently SFI is a valid TLA in networking 2008-09-26 16:21 just hit on it in a doc 2008-09-26 16:38 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 17:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 17:01 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 17:06 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 17:37 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-26 20:02 -!- ajonat(~ajonat@190.48.119.128) has joined #tux3 2008-09-26 20:30 -!- ajonat(~ajonat@190.48.119.128) has joined #tux3 2008-09-26 21:16 maze, in my lexicon a SFI is a stupid fscking idiot ;) 2008-09-26 21:16 somebody who likes to invent TLAs to be leet 2008-09-26 21:17 I suppose that would now include me 2008-09-26 21:18 course we could consider an exemption for those who only make them up as satire 2008-09-26 22:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-26 22:32 hey tim_dimm 2008-09-27 02:12 hey 2008-09-27 02:34 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-27 02:55 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-27 02:55 hello guys 2008-09-27 03:02 no one is here when im around 2008-09-27 03:02 :( 2008-09-27 03:02 its bad to be on the other side of the world 2008-09-27 03:08 yeah, we're all hard asleep 2008-09-27 03:11 hehe 2008-09-27 03:12 yeah 2008-09-27 03:12 wht u doin? 2008-09-27 03:25 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-27 03:27 falling asleep while skimming code 2008-09-27 03:28 hmm 2008-09-27 03:28 which code are u skimming 2008-09-27 03:40 just some of the other places in the kernel which do options parsing 2008-09-27 04:01 hmm 2008-09-27 04:01 ok 2008-09-27 05:19 -!- Kirantpatil(~kiran@122.167.179.185) has joined #tux3 2008-09-27 05:22 -!- Kirantpatil(~kiran@122.167.179.185) has left #tux3 2008-09-27 07:26 -!- BSD(~bandan@38.117.250.152) has joined #tux3 2008-09-27 07:44 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 09:14 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 09:41 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 09:45 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 11:34 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 11:43 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-27 12:19 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-27 12:19 heya 2008-09-27 12:20 hi. thought you wanted to post the univ-sessions? 2008-09-27 12:21 yeah, i was going through them... 2008-09-27 12:22 dint finish reading.. 2008-09-27 13:14 pranith: thanks 2008-09-27 13:15 data, welcome 2008-09-27 13:15 data, what do you do? 2008-09-27 13:15 for tux3? nothing but reading :) 2008-09-27 13:15 reading as in? 2008-09-27 13:16 reading the channel, university sessions, some code. But it's still exam-time, and since i am double majoring i don't have that much spare time 2008-09-27 13:16 not for tux, in gen 2008-09-27 13:17 you mean what i read in my leisure time? 2008-09-27 13:17 i meant if you went to college or something? 2008-09-27 13:17 u are majoring in? 2008-09-27 13:17 computer science and math 2008-09-27 13:18 nice 2008-09-27 13:18 so yes, university, in germany 2008-09-27 13:18 which univ/ 2008-09-27 13:18 if you know it, karlruhe, technical university 2008-09-27 13:18 soon to be known as KIT 2008-09-27 13:18 karlsruhe, actually 2008-09-27 13:19 hmm 2008-09-27 13:19 ok 2008-09-27 13:20 and what are youdoing? 2008-09-27 13:20 im working 2008-09-27 13:20 completed my bachelors recently 2008-09-27 13:27 -!- ajonat(~ajonat@190.48.124.246) has joined #tux3 2008-09-27 13:50 sorry, i am not that talkative right now. still reading a little about real time systems and scheduling for my exam on monday 2008-09-27 13:50 oh 2008-09-27 13:50 all the best 2008-09-27 13:50 thanks 2008-09-27 13:51 it's not that hard, but they have a lot of analog to digital stuff in there 2008-09-27 13:51 where it gets kind of hairy 2008-09-27 13:51 op-amps e.g. 2008-09-27 13:52 hmm 2008-09-27 13:52 nice stuff 2008-09-27 13:52 u have that in rts? 2008-09-27 13:52 yes, dunno why 2008-09-27 13:52 i can see the relation, but still 2008-09-27 13:52 hmm, interesting 2008-09-27 13:53 closed loop controls with their laplace transforms, z-transforms 2008-09-27 13:53 hmm 2008-09-27 13:56 oh, and one should not forget all the stuff about cnc machines, programming ofthem, robot controls and ... just too much stuff not really related to the topic 2008-09-27 14:11 the relationship between analog and realtime is deep and important 2008-09-27 14:11 filters come into realtime a lot 2008-09-27 14:12 and time derivatives and integrals 2008-09-27 14:12 your delta t has to be exact ;) 2008-09-27 14:12 or your rocket explodes 2008-09-27 14:12 file_bwrite: block write <0:0> 2008-09-27 14:12 ---- extent 0x0/4 ---- 2008-09-27 14:12 balloc extent -> [2/4] 2008-09-27 14:12 segs: 0x2/4 (1) 2008-09-27 14:12 group -1/0 at entry -1/0 2008-09-27 14:12 1 entry groups: 2008-09-27 14:12 0/1: 0 => 2/4; 2008-09-27 14:12 file_bwrite: block write <0:5> 2008-09-27 14:12 ---- extent 0x5/2 ---- 2008-09-27 14:12 balloc extent -> [9/2] 2008-09-27 14:12 segs: 0x9/2 (1) 2008-09-27 14:13 group 0/1 at entry 0/1 2008-09-27 14:13 1 entry groups: 2008-09-27 14:13 0/2: 0 => 2/4; 5 => 9/2; 2008-09-27 14:13 flush... Success 2008-09-27 14:13 extent writing a lot closer to working 2008-09-27 14:13 now two discontiguous extents formed and written into the dleaf 2008-09-27 14:14 flips: i see the connection... learnt enough about it :0 2008-09-27 14:45 next test is to rewrite a region that already has extents 2008-09-27 14:45 and expose the next flock of bugs 2008-09-27 14:45 shapor... 2008-09-27 15:00 Ok, here we go again: 2008-09-27 15:00 ---- extent 0x0/7 ---- 2008-09-27 15:00 balloc extent -> [c/5] 2008-09-27 15:00 segs: 0xc/5 0x9/2 (2) 2008-09-27 15:00 group 0/1 at entry 0/2 2008-09-27 15:00 group 0/1 at entry 0/2 2008-09-27 15:00 1 entry groups: 2008-09-27 15:00 0/3: 0 => 2/4 c/5; 5 => 9/2; 0 => ; 2008-09-27 15:01 close actually 2008-09-27 15:01 well not actually 2008-09-27 15:01 buncha bugs 2008-09-27 15:33 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-27 15:34 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-27 15:50 -!- joededman(~chatzilla@S0106005004b0be73.ed.shawcable.net) has joined #tux3 2008-09-27 16:01 folks 2008-09-27 16:05 ok, let's see where this puppy strays afield 2008-09-27 16:09 flips: how far do you feel like skating today? 2008-09-27 16:10 hard to say 2008-09-27 16:10 I take it you had an adventure in mind? 2008-09-27 16:10 thinking about ti 2008-09-27 16:10 it 2008-09-27 16:29 -!- pgquiles(~pgquiles@82.Red-81-33-103.dynamicIP.rima-tde.net) has joined #tux3 2008-09-27 16:52 ok, there's the first bug, obvious enough: after retrieving an extent to see if it should be skpped, if it should not be skipped then we need to rewind before the next step, or save the extent as the current one and scan on from there 2008-09-27 16:53 maybe I should explain what I'm doing on the list so we can get a couple more hands pulling on the oars 2008-09-27 16:54 I guess it is probably better to avoid rewinds and save some cpu 2008-09-27 16:58 but then when we do rewind we need to include the extent we just found, so a rewind would have to not only reset the dwalk state but the saved extent just found there 2008-09-27 16:59 fiddly, but probably the most efficient way to do it 2008-09-27 17:03 getting close to sk8 oclock 2008-09-27 17:27 ah, dwalk_back is the answer 2008-09-27 17:27 unread the last returned extent 2008-09-27 19:45 -!- BSD(~bandan@38.117.250.152) has joined #tux3 2008-09-27 21:10 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-27 22:12 folks 2008-09-27 22:23 ok, next bug is clear 2008-09-27 22:23 clear braindamage 2008-09-27 22:24 have to truncate before repacking 2008-09-27 22:24 obvious :-P 2008-09-27 22:41 -!- ajonat(~ajonat@190.48.124.246) has joined #tux3 2008-09-27 22:48 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-27 22:52 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-28 00:21 -!- pranith(7aa040b1@webchat.mibbit.com) has joined #tux3 2008-09-28 02:06 -!- Aks(~ankitsriv@123.237.69.19) has joined #tux3 2008-09-28 02:07 -!- Aks(~ankitsriv@123.237.69.19) has left #tux3 2008-09-28 03:05 -!- Kirantpatil(~kiran@122.167.219.252) has joined #tux3 2008-09-28 03:05 -!- Kirantpatil(~kiran@122.167.219.252) has left #tux3 2008-09-28 03:05 -!- bobby(~bobby@122.162.68.241) has joined #tux3 2008-09-28 03:05 hey guys 2008-09-28 03:12 anyone awake? 2008-09-28 03:41 -!- bobby(~bobby@122.162.68.241) has joined #tux3 2008-09-28 03:47 -!- paola(~paola@ppp-219-23.20-151.libero.it) has joined #tux3 2008-09-28 05:23 -!- pgquiles(~pgquiles@166.Red-83-35-243.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 06:01 -!- pgquiles(~pgquiles@166.Red-83-35-243.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 07:02 -!- paola(~paola@ppp-157-16.20-151.libero.it) has joined #tux3 2008-09-28 08:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-28 08:04 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-28 09:45 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 10:24 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 11:18 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 11:45 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-28 11:53 -!- pgquiles__(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-28 12:10 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-28 12:14 -!- pranith(7aa24857@webchat.mibbit.com) has joined #tux3 2008-09-28 12:14 hey guys 2008-09-28 12:15 flipsout: was thinking about the unit testing you wanted for dwalk_pack 2008-09-28 12:15 i am not sure about how you go about doing that.. was reading the dleaf.c code today. 2008-09-28 12:16 u mentioned some unit test already present 2008-09-28 12:16 can u point out where that is? 2008-09-28 15:01 -!- paola(~paola@ppp-157-16.20-151.libero.it) has left #tux3 2008-09-28 15:34 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-28 16:02 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-28 16:02 hi 2008-09-28 16:03 so, wait wait, why is this Tux3 better than ext3 ? 2008-09-28 16:03 orgthing, did you read the initial post? 2008-09-28 16:03 orgthingy 2008-09-28 16:03 initial post? 2008-09-28 16:04 "Tux3, a versioning filesystem" 2008-09-28 16:04 tux3.org seems empty :P 2008-09-28 16:04 did you follow the links there? 2008-09-28 16:04 i dont want to sound like an idiot, but what exactly is "versioning" filesystem ? 2008-09-28 16:05 another word for snapshots 2008-09-28 16:06 flips : and, how is tux2 opensource if it was never released? 2008-09-28 16:07 flips : and, what do you think is better than ext3 (discarding tux2 and tux3) in your opinion ? 2008-09-28 16:07 the code is linked on the site and is under gpl v3 2008-09-28 16:08 orgthingy, care to say something about yourself and where you are coming from? 2008-09-28 16:08 flips : well, what shall i say... Im just an opensource fan and my goal is to be UNIX genius? :P 2008-09-28 16:08 UNIX/Linux to be exact xD 2008-09-28 16:09 worked on which projects? 2008-09-28 16:09 yes, mostly bug and features reports though 2008-09-28 16:09 used to do stuff with python iirc 2008-09-28 16:09 ext3 does not have versioning of any form, or extents 2008-09-28 16:09 flips : Id be happy to help with tux3, but not sure how this whole thing works 2008-09-28 16:10 but ill study it and understand it 2008-09-28 16:10 it is also slow at deleting 2008-09-28 16:10 yes, slow at deleting.. cant deny that :P 2008-09-28 16:10 it is also limited to files and volumes of a few TB 2008-09-28 16:11 ext4 is what you should be asking about 2008-09-28 16:11 which also has no snapshots 2008-09-28 16:12 few TB? 2008-09-28 16:12 isnt that quite.. a lot :P 2008-09-28 16:12 not these days, a TB disk costs a little over $100 2008-09-28 16:14 :| 2008-09-28 16:14 man, they lied to me then! 2008-09-28 16:14 i bought 250GB external HD for $90 !! 2008-09-28 16:15 and i was like "oh, quite fair price" ! 2008-09-28 16:15 yes, you got ripped off 2008-09-28 16:15 :( 2008-09-28 16:15 all prices were like that in different shops 2008-09-28 16:15 :'( 2008-09-28 16:16 anyway, flips, may you tell me about you? 2008-09-28 16:16 google me? 2008-09-28 16:18 lol, ok 2008-09-28 16:18 ACTION googles "flips" 2008-09-28 16:18 flips : nothing that seems to be u 2008-09-28 16:19 ACTION googles himself 2008-09-28 16:19 ACTION saw something embarrassing that happened 6 years ago 2008-09-28 16:19 me hides xD 2008-09-28 16:19 /me * 2008-09-28 16:20 try /whois flips 2008-09-28 16:20 maybe consider using your real name on irc 2008-09-28 16:21 I glaube dass du kanst Deutsch 2008-09-28 16:21 deutsch? 2008-09-28 16:21 german? 2008-09-28 16:22 omg 2008-09-28 16:22 Daniel Phillips :| 2008-09-28 16:22 flips : nice to meet you 2008-09-28 16:23 nice to meet you 2008-09-28 16:23 flips : Im gonna learn german soon, but online :( 2008-09-28 16:23 no classes over here 2008-09-28 16:23 i mean, i gotta learn in LiveMocha 2008-09-28 16:23 doesn't matter, I just thought you were german because you are connected to a server in darmstadt 2008-09-28 16:23 flips : is there ANY GOOD WAY that I can learn about filesystems? tux3 sounds like a nice project 2008-09-28 16:24 http://en.wikipedia.org/wiki/Filesystems <- start here 2008-09-28 16:24 ah, wikipedia... :P 2008-09-28 16:25 http://lxr.linux.no/linux+v2.6.26.5/fs/open.c#L1106 <- continue here 2008-09-28 16:26 flips : thanks sir, but a usual feature i tell devoloper is: How about making a program for Windows and probably OS X that makes it possible to read your filesystem, which is tux3 is this situation"? 2008-09-28 16:27 exercise left for the interested reader 2008-09-28 16:27 because, FAT32 is bad, and seems to be only way, after NTFS, to be read by all Oses.. and since NTFS is proprietary, and not *that good*, i suggest making these kind of readers 2008-09-28 16:39 flips : may I ask what FileSystem you are using right now? 2008-09-28 16:40 ext3 2008-09-28 16:40 I see 2008-09-28 16:42 flips : it says in wikipedia that stuff get mounted at boot time.. but what about, like, umm.. we inserted a CD after we booted? 2008-09-28 16:42 how can it be automatically mounted? 2008-09-28 16:44 automount? 2008-09-28 16:44 flips, yes 2008-09-28 16:44 how exactly does automount work? 2008-09-28 16:45 beyond the scope of this channel, http://www.google.com/search?q=automount 2008-09-28 16:47 ah, Google again xD 2008-09-28 17:20 finally, dwalk_next and dwalk_back seem to kind of make sense 2008-09-28 17:20 could be near the end of pain for extent writing 2008-09-28 17:21 checkin coming a little after sk8 oclock 2008-09-28 19:22 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-28 20:02 tim_dimm, ping 2008-09-28 20:09 folks 2008-09-28 20:11 flips: give me a few 2008-09-28 21:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-28 21:50 -!- Kirantpatil(~kiran@122.167.215.234) has joined #tux3 2008-09-28 21:50 -!- Kirantpatil(~kiran@122.167.215.234) has left #tux3 2008-09-28 22:45 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-09-28 22:45 hey guys 2008-09-28 22:45 flips: you got some time? 2008-09-28 22:45 flips: need u to tell me about unit tests for tux3 2008-09-28 22:45 .. 2008-09-28 22:45 hi 2008-09-28 22:46 hello 2008-09-28 22:46 parnith, for example to run the dleaf unit test: make dleaf && make dleaftest 2008-09-28 22:46 ok, where are the tests written? dleaftest.c? 2008-09-28 22:47 dleaf test is written in dleaf.c 2008-09-28 22:47 it's a main routine that is only compiled if you're compiling just dleaf by itself 2008-09-28 22:47 the other unit tests are similar 2008-09-28 23:14 ok.. 2008-09-28 23:14 going through them 2008-09-28 23:25 -!- pranith(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-09-28 23:27 -!- ajonat(~ajonat@190.48.124.246) has joined #tux3 2008-09-28 23:36 hmm 2008-09-28 23:48 not clear what the unit tests are testing? 2008-09-28 23:56 btw, I'm coming through LA to SF on Friday 2008-09-28 23:56 I'll be around 2008-09-29 00:09 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-29 00:12 flips, going through the code 2008-09-29 00:12 trying to figure out the unit tests 2008-09-29 00:12 some are obvious, others not so obvious 2008-09-29 00:17 file_bwrite: block write <0:0> 2008-09-29 00:17 ---- extent 0x0/7 ---- 2008-09-29 00:17 existing extents: 0x0 => 2/4; 0x5 => 9/2; 2008-09-29 00:17 ---- rewind to 0x0 => 2/4 ---- 2008-09-29 00:17 balloc extent -> [4/1] 2008-09-29 00:17 segs: 0x2/4 0x4/1 0x9/2 (3) 2008-09-29 00:17 dwalk_chop_after: 1 groups, 0 entries in last 2008-09-29 00:17 1 entry groups: 2008-09-29 00:17 0/0: 2008-09-29 00:17 group 0/1 at entry -1/0 2008-09-29 00:17 group 0/1 at entry 1/1 2008-09-29 00:17 group 0/1 at entry 3/2 2008-09-29 00:17 1 entry groups: 2008-09-29 00:17 0/3: 0 => 2/4; 4 => 4/1; 5 => 9/2; 2008-09-29 00:17 flush... Success 2008-09-29 00:17 woohoo, first time this ever worked 2008-09-29 00:17 rewrite a region of file containing two discontiguous extents 2008-09-29 00:18 it properly fills in the 1 block gap between them 2008-09-29 00:18 nice 2008-09-29 00:41 hmm, a bug in set_bits, that has to hurt 2008-09-29 00:54 ah, no it was a bug in balloc_extent_from_range 2008-09-29 01:42 ACTION resets his firefox home page from google.com to tux3.org 2008-09-29 01:45 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-29 01:45 morning pgquiles 2008-09-29 01:46 hey flips 2008-09-29 01:47 I was thinking of you yesterday 2008-09-29 01:47 I'm touched ;) 2008-09-29 01:48 mkfs reserves 5% of the blocks for root, what's the use case for that? system crashes or is full and root comes to the rescue? 2008-09-29 01:48 I'm asking because that accounts for a whopping 50GB in my 1TB disk, which is a lot of seemingly wasted space :-/ 2008-09-29 01:48 attempt to avoid DoS of root 2008-09-29 01:48 by ordinary user, in absense of quotas 2008-09-29 01:49 but 5% is excessive 2008-09-29 01:49 on the other hand, it does need to scale with time as everything gets more bloated 2008-09-29 01:49 I guess 1GB would more more than enough 2008-09-29 01:49 50 MB would be enough 2008-09-29 01:49 probably 2008-09-29 01:50 I looked at the source of mkfs and was surprised to discover it accepts -mDOUBLE, I thought it only accepted an int 2008-09-29 01:50 ( I used -m1 and it still hurts!) 2008-09-29 01:51 you never know how many blocks sombody might want to reserve 2008-09-29 01:51 takes a double... 2008-09-29 01:52 yes, that's dumb but does no harm 2008-09-29 01:52 email ted and ask for a decimal point 2008-09-29 01:53 "avoids fragmnentation"... I doubt that 2008-09-29 01:54 I think what the man page means there is, performance gets so pathetically bad when the filesystem is 95%+ full that we just don't allow it 2008-09-29 01:54 think of it as a lameness tax ;-) 2008-09-29 01:54 we will try to make tux3 perform reasonably well even at 99% full 2008-09-29 01:54 going to be lots of work, hope you are ready to help 2008-09-29 01:55 send us some of your spanish dev buddies 2008-09-29 02:00 :-D 2008-09-29 02:00 I don't know anybody in Spain developing filesystems 2008-09-29 02:01 althought I must say the tux3 university is really useful 2008-09-29 02:01 I hope I'll have some spare time to catch up 2008-09-29 02:01 eventually :-) 2008-09-29 02:02 i think we need a video version of tux3 U someday 2008-09-29 02:02 hi pgguiles 2008-09-29 02:13 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-29 03:07 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-29 03:25 hello ! 2008-09-29 03:32 i finally understood what "versioning filesystem" really means :P 2008-09-29 03:33 how about saying it in your own words 2008-09-29 03:34 flips : its like when it keeps old copies of file or folder? 2008-09-29 03:35 partly 2008-09-29 03:38 ACTION keeps searching about "versioning filesystems" 2008-09-29 03:38 flips : today, Im gonna buy some books about that xD 2008-09-29 03:38 haha 2008-09-29 03:38 i have to learn about these things.. sounds interesting :P 2008-09-29 03:38 we're writing the books ;) 2008-09-29 03:39 source codes? 2008-09-29 03:41 orgthingy, did u understand the version pointer part? 2008-09-29 03:41 pranith : not yet 2008-09-29 03:41 about how maintaining versions this way is useful? 2008-09-29 03:41 hmm 2008-09-29 03:41 because once i want to learn something, i find out that i need to learn another thing 2008-09-29 03:41 complicated :P 2008-09-29 03:42 hmm, yeah 2008-09-29 03:42 kind of 2008-09-29 03:42 i had to read the document thrice 2008-09-29 03:42 before i got a semblence of understanding 2008-09-29 03:43 orgthingy, u from germany? 2008-09-29 03:43 peanitth : no, but im interested in german language 2008-09-29 03:43 :P 2008-09-29 03:44 ohk 2008-09-29 03:44 ACTION stares at http://www.ext3cow.com/Welcome_files/example1.jpg 2008-09-29 03:44 ACTION smiles 2008-09-29 04:06 -!- pgquiles__(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-29 04:30 flips, can u explain the fields in struct dwalk? 2008-09-29 04:31 pranith, sure 2008-09-29 04:31 specially about the gdict and edict fields 2008-09-29 04:31 they are just what is necessary for dwalk_next to work efficiently 2008-09-29 04:31 ok, what are they used for? 2008-09-29 04:31 needs to have a pointer to an extent and a group and an entry, and a limit for each to know when to go to the next 2008-09-29 04:32 exbase is a little different, the ->limit field of entry is relative to that 2008-09-29 04:32 and the "mock" fields are for calculating the finished size of a packed leaf, without actually writing the leaf 2008-09-29 04:32 i suppose exbase is the base of the extent 2008-09-29 04:33 yes 2008-09-29 04:33 i.e., the current extent's pointer 2008-09-29 04:33 no 2008-09-29 04:33 it's the lowest extent for an entire group of entries 2008-09-29 04:33 there is a description of the dleaf format somewhere 2008-09-29 04:34 ping shapor about it maybe 2008-09-29 04:34 he's the expert 2008-09-29 04:34 ohk, will do 2008-09-29 04:34 writing a mail.. that will better document it 2008-09-29 04:34 notices that the "limit" field of the entry and the "count" field of the group are only one byte, that goes a long way towards explaining some of the apparent complexity 2008-09-29 04:35 a dleaf index is highly compressed and therefore a little tricky to edit 2008-09-29 04:35 hmm 2008-09-29 04:35 yeah, u were trying to simplify that 2008-09-29 04:36 with an api... 2008-09-29 04:36 successfully, see the latest checkins 2008-09-29 04:36 filemap.c is now pretty obvious I think 2008-09-29 04:37 ok, how do i update using hg? i usually remove the tux3 folder and pull it again :( 2008-09-29 04:37 that works 2008-09-29 04:37 just hg pull will do it 2008-09-29 04:37 ok 2008-09-29 04:37 then hg update I think 2008-09-29 04:37 ok 2008-09-29 04:39 night 2008-09-29 04:40 gn 2008-09-29 04:42 flips : have you compared ext3cow to tux3 ? 2008-09-29 06:08 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-29 07:16 -!- orgthingy_(~orgthingy@62.150.55.188) has joined #tux3 2008-09-29 07:28 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-29 07:54 -!- smitht(~chatzilla@ool-182f94db.dyn.optonline.net) has joined #tux3 2008-09-29 07:57 -!- smitht(~chatzilla@ool-182f94db.dyn.optonline.net) has left #tux3 2008-09-29 10:08 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-29 10:18 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-29 11:21 -!- SEJeff(~jeff__@66.151.59.138) has joined #tux3 2008-09-29 12:13 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-29 13:06 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-29 13:10 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-29 14:16 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-29 14:22 -!- kbingham(~kbingham@92.21.238.93) has joined #tux3 2008-09-29 15:15 bah 2008-09-29 15:15 ACTION is kind of annoyed by how sloppy the lockdep code is 2008-09-29 15:16 like folks never figured out in the Linux community to use a lot of small abstraction so that function bodies express things simply and are readable in that way as well 2008-09-29 15:16 it's just extra brain wankery that I could do without 2008-09-29 15:30 -!- orgthingy(~orgthingy@62.150.55.188) has left #tux3 2008-09-29 15:30 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-29 15:30 ops 2008-09-29 16:01 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-29 16:06 so, are you all developers in tux3? 2008-09-29 16:56 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-29 18:02 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-29 18:43 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-29 19:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-29 19:42 bh, ingo isn't big on abstraction, more a hack it now kinda guy 2008-09-29 19:42 has its uses 2008-09-29 19:43 bh, but you can always abstract it as it should be 2008-09-29 19:43 beauty of open source 2008-09-29 19:44 orgthingy, ext3cow looks like a fine project 2008-09-29 19:44 but it's missing a few essential things 2008-09-29 19:45 like: writable snapshots, snapshots of snapshots, deletion of snapshots 2008-09-29 19:46 it's very clever as far as it goes 2008-09-29 19:46 oh and limited to 2 TB 2008-09-29 19:47 like ext3 2008-09-29 19:47 well 2008-09-29 19:47 maybe they have increased that to 16TB 2008-09-29 19:47 still too small by today's standards 2008-09-29 19:48 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-29 19:53 flips: trying to get my patches integrate, I fear, would be a pain 2008-09-29 19:53 if they're good developers will use them integrated or not 2008-09-29 19:53 they're for devs anyway 2008-09-29 19:54 yeah, looking at his code drives me up the fucking wall at times 2008-09-29 19:54 I mean, I'm use to it now after all of these years, but, man, it's massive mess for clean up for other developers 2008-09-29 19:54 like with the scheduler 2008-09-29 19:54 or rtmutex, et.c.. 2008-09-29 19:55 there's a certain point where you can't really do much other than hack it in a more limited way without a mass refactorng 2008-09-29 20:50 -!- Kirantpatil(~kiran@122.167.178.24) has joined #tux3 2008-09-29 20:50 -!- Kirantpatil(~kiran@122.167.178.24) has left #tux3 2008-09-29 21:02 -!- ajonat(~ajonat@190.48.120.169) has joined #tux3 2008-09-29 22:29 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-29 22:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-29 23:57 -!- bobby(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-09-29 23:57 heya 2008-09-29 23:59 hello all, anyone here? 2008-09-30 00:00 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-30 00:00 MaZe, hello 2008-09-30 00:07 hi pranith 2008-09-30 00:09 flips, no reply from shapor :( 2008-09-30 00:09 about the struct fields description... 2008-09-30 00:09 may be u can reply ? ;) 2008-09-30 00:09 right 2008-09-30 00:11 pranith, have you read the comment that begins: /* Leaf index format 2008-09-30 00:11 ? 2008-09-30 00:11 in dleaf.c 2008-09-30 00:11 i tried :) 2008-09-30 00:12 the header contains the two level index followed by the table of extents 2008-09-30 00:13 i dint understand the limit on number of versions at the same level 2008-09-30 00:13 dinner time for me 2008-09-30 00:14 ohkies 2008-09-30 00:14 it's simple: you can't have more than 255 entries in one group, therefore can't have more than 255 entries with the same logical address 2008-09-30 00:14 well 2008-09-30 00:14 actually that is probably wrong now 2008-09-30 00:15 hmm, anything changed? 2008-09-30 00:15 you can have multiple dleaf groups with the same logical address now I think 2008-09-30 00:15 sure, lots of code changed 2008-09-30 00:15 every day 2008-09-30 00:15 later... 2008-09-30 00:15 okies 2008-09-30 01:59 folks 2008-09-30 01:59 ACTION is back from a night of goofing off 2008-09-30 01:59 feels great 2008-09-30 01:59 I was working myself into the ground much of last week 2008-09-30 02:14 -!- ajonat(~ajonat@190.48.120.169) has joined #tux3 2008-09-30 02:14 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-30 02:14 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-09-30 02:14 -!- SEJeff(~jeff__@66.151.59.138) has joined #tux3 2008-09-30 02:14 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-30 02:14 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-30 02:14 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-30 02:14 -!- ceatinge(~ceatinge@72.232.13.50) has joined #tux3 2008-09-30 02:14 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-09-30 02:14 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-30 02:14 -!- Bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-09-30 02:14 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-30 02:14 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-30 02:14 -!- ChanServ changed mode/#tux3 -> -o tux3bot 2008-09-30 02:31 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-30 03:01 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-30 03:26 -!- Kirantpatil(~kiran@122.166.94.37) has joined #tux3 2008-09-30 03:26 hello 2008-09-30 03:26 ah, what a great night ^______^ 2008-09-30 03:35 -!- Kirantpatil(~kiran@122.166.94.37) has left #tux3 2008-09-30 03:35 -!- Kirantpatil(~kiran@122.166.94.37) has joined #tux3 2008-09-30 03:53 orgthingy, where from/ 2008-09-30 03:53 ? 2008-09-30 03:54 why everybody keeps asking where im from xD 2008-09-30 04:10 why? you have a problem in that/ 2008-09-30 04:17 pranith : well, not really 2008-09-30 04:17 but i dont usually say any personal info in IRC :P 2008-09-30 04:18 ohk 2008-09-30 04:18 i dint knw that ones country is personal 2008-09-30 04:20 hi pranith 2008-09-30 04:21 Kirantpatil, helo 2008-09-30 04:21 is the session over today morning ? 2008-09-30 04:22 again i failed join .. 2008-09-30 04:22 TUX 3 university session. 2008-09-30 04:23 hmm 2008-09-30 04:23 yeah, morning is a bad time for people here 2008-09-30 04:23 i too miss it 2008-09-30 04:23 usually get it from the logs 2008-09-30 04:23 i am from bengaluru.. 2008-09-30 04:24 how about you.. 2008-09-30 04:24 delhi 2008-09-30 04:24 oh cool.. 2008-09-30 04:24 what do u do? 2008-09-30 04:24 i run Freesoftware training center.. 2008-09-30 04:24 oh 2008-09-30 04:24 which place in bangy? 2008-09-30 04:25 Driver programming, Administration.. 2008-09-30 04:25 it is in rajaji nagar. 2008-09-30 04:25 hmm 2008-09-30 04:25 i knw only koramangala 2008-09-30 04:25 and marathahalli 2008-09-30 04:25 ok.. 2008-09-30 04:25 and mg road ;) 2008-09-30 04:26 i am planning for giving filesystem training. 2008-09-30 04:26 so i am preparing for it.. 2008-09-30 04:26 oh 2008-09-30 04:26 nice 2008-09-30 04:26 i have it scheduled from Nov 1. 2008-09-30 04:27 hmm 2008-09-30 04:27 too soon i guess 2008-09-30 04:27 hope for getting some contributors ... 2008-09-30 04:27 oh 2008-09-30 04:27 which level are the students here? 2008-09-30 04:28 basically they span from college grads to experienced fellows.. 2008-09-30 04:28 hmm 2008-09-30 04:28 ok 2008-09-30 04:29 my motives are to spread linux kernel programming in easy way 2008-09-30 04:30 here is our website www.turtlelinuxlabs.in 2008-09-30 04:30 nice 2008-09-30 04:31 i am still in learning phase of filesystems.. 2008-09-30 04:32 hmm 2008-09-30 04:32 i tried to apply the patch of daniels posted in lwn.net and compile the kernel.. 2008-09-30 04:32 it was showing some error. 2008-09-30 04:34 give me some guidelines on this.. 2008-09-30 04:37 which patch? 2008-09-30 04:37 tux3 is not yet in kernel 2008-09-30 04:38 its still in userspace in fuse 2008-09-30 04:38 you dont need to compile the kernel to test this 2008-09-30 04:38 just use the fuse version 2008-09-30 04:39 please see this http://lwn.net/Articles/299740/ 2008-09-30 04:39 i followed that link.. 2008-09-30 04:41 hmm, i think i missed that 2008-09-30 04:41 :( 2008-09-30 04:41 what shall i do then.. 2008-09-30 04:41 flips, why dont u cc to tux3?? 2008-09-30 04:42 im not sure.. 2008-09-30 04:42 ive never compiled this in a kernel before.. 2008-09-30 04:43 where can i get the fuse version.. 2008-09-30 04:48 am i doing right here.. 2008-09-30 04:51 hmm 2008-09-30 04:51 use the mercurial repo 2008-09-30 04:52 hg pull http://phunq.net/tux3 2008-09-30 04:52 install mercurial 2008-09-30 04:53 ok, i will try that 2008-09-30 04:57 then cd tux3/user/test 2008-09-30 04:57 make && make debug 2008-09-30 04:58 it will mount in /tmp/ 2008-09-30 05:04 thanks pranith. 2008-09-30 05:12 welcome :) 2008-09-30 05:13 i am getting some errors, shall i paste it here 2008-09-30 05:13 i am using ubuntu gibbon. 2008-09-30 05:15 tux3.c:14:18: error: popt.h: No such file or directory 2008-09-30 05:16 sudo apt-get install libpopt-dev 2008-09-30 05:34 Kirantpatil, worked? 2008-09-30 05:42 yes it worked. 2008-09-30 05:43 i am just execting sudo make testfuse 2008-09-30 05:50 ok.. i played with testfuse and testfs 2008-09-30 05:50 they are working fine.. 2008-09-30 05:51 next what should i do ?? 2008-09-30 05:51 work with dleaf and dleaftest 2008-09-30 05:51 make dleaf 2008-09-30 05:51 make dleaftest 2008-09-30 05:51 ./dleaf 2008-09-30 05:51 there is a bug with testfuse 2008-09-30 05:51 in readdir... 2008-09-30 05:52 ls 2008-09-30 05:52 touch hello 2008-09-30 05:52 ls 2008-09-30 05:52 rm hello 2008-09-30 05:52 ls 2008-09-30 05:53 i am now installing valgrind 2008-09-30 05:54 no need 2008-09-30 05:54 u can run it directly 2008-09-30 05:54 ./dleaf 2008-09-30 05:54 ok.. 2008-09-30 05:57 i did run ./dleaf 2008-09-30 05:57 it is showing lot of dwalk messages.. 2008-09-30 05:58 i didnt understand where should i do "touch hello" "ls" and "rm hello" 2008-09-30 06:00 make debugfs 2008-09-30 06:00 "make debug" 2008-09-30 06:00 go to /tmp/test 2008-09-30 06:00 them do touch and rm then ls 2008-09-30 06:03 yeah 2008-09-30 06:03 root@kiran-desktop:/tmp/test# ls 2008-09-30 06:03 ???@???? 2008-09-30 06:04 this is how looks after "rm hello" 2008-09-30 06:04 mode 0100666 uid 0 gid 0 root d:1 2008-09-30 06:04 ---- get attr for '/' ---- 2008-09-30 06:04 ---- get attr for '/' ---- 2008-09-30 06:04 ---- readdir '/' at 0 ---- 2008-09-30 06:04 ---- get attr for '� 2008-09-30 06:04 @�' ---- 2008-09-30 06:04 ---- get attr for '/� 2008-09-30 06:04 @�' ---- 2008-09-30 06:04 ---- readdir '/' at 1000 ---- 2008-09-30 06:05 in debug message. 2008-09-30 06:10 then i think i need to understand the code .. 2008-09-30 06:10 am i right.. 2008-09-30 06:13 yup 2008-09-30 06:13 u need to.. 2008-09-30 06:13 something wrong in readdir 2008-09-30 06:13 i dint look further 2008-09-30 06:49 -!- Kirantpatil(~kiran@122.166.94.37) has left #tux3 2008-09-30 07:33 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-09-30 07:41 flips, there? 2008-09-30 07:47 -!- pgquiles__(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-09-30 09:21 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-30 09:27 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-30 09:38 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-30 09:49 -!- Kirantpatil(~kiran@122.167.222.94) has joined #tux3 2008-09-30 09:49 -!- Kirantpatil(~kiran@122.167.222.94) has left #tux3 2008-09-30 09:50 -!- Kirantpatil(~kiran@122.167.222.94) has joined #tux3 2008-09-30 09:50 -!- Kirantpatil(~kiran@122.167.222.94) has left #tux3 2008-09-30 09:52 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-30 10:28 -!- Kirantpatil(~kiran@122.167.222.94) has joined #tux3 2008-09-30 10:28 -!- Kirantpatil(~kiran@122.167.222.94) has left #tux3 2008-09-30 10:48 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-30 10:51 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-30 12:01 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-09-30 12:43 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-30 17:10 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-30 18:58 -!- ajonat(~ajonat@190.48.107.189) has joined #tux3 2008-09-30 19:32 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-30 19:52 19:52:21 2008-09-30 19:54 that's true 2008-09-30 19:54 ah, and I missed the ping from pranith too 2008-09-30 19:54 yesterday 2008-09-30 19:55 I guess I'd better fix the readdir bug in fuse 2008-09-30 19:55 specially as a provisional fix has been offered 2008-09-30 19:55 -!- ajonat_(~ajonat@190.48.122.185) has joined #tux3 2008-09-30 19:59 t -30 & counting 2008-09-30 20:00 t -> tux3 2008-09-30 20:00 browsers running? 2008-09-30 20:00 mayhaps 2008-09-30 20:01 we start here: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2040 2008-09-30 20:01 or maybe we should start from where this is called in _copy2 2008-09-30 20:01 _2copy 2008-09-30 20:01 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2106 2008-09-30 20:02 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-30 20:02 razvanm: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2106 2008-09-30 20:02 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-30 20:02 ACTION is sorry that he is late 2008-09-30 20:02 flips: razvanm: http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2106 2008-09-30 20:02 ACTION is glad you're here 2008-09-30 20:02 page = __grab_cache_page(mapping, index); 2008-09-30 20:03 ok, we're getting a cache page that user data will be copied onto 2008-09-30 20:03 later that page will be added to a bio and thrown at a device 2008-09-30 20:03 but today we're just going to look at the page cache 2008-09-30 20:04 that is, the list of pages belonging to a particular inode that have been read in via some buffer IO operation 2008-09-30 20:04 or directly created, as here 2008-09-30 20:04 since we know we're going to write to this page, normally the entire thing, there is no need to read it first 2008-09-30 20:05 we just "grab" it, and by that, viro means look into the cache and allocated a page if one is not already there 2008-09-30 20:06 so lets got to http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2040 and see how it works 2008-09-30 20:06 quick q: this "grab" is unique to this case? 2008-09-30 20:06 pretty much 2008-09-30 20:06 page cache ops are highly non-orthogonal 2008-09-30 20:06 there may or may not be justification for that 2008-09-30 20:07 they just kind of grew from usage, like most of linux 2008-09-30 20:07 and the comment claims it's just for buffered writes 2008-09-30 20:07 possibly true 2008-09-30 20:07 should we take a look at what mapping and index? :P 2008-09-30 20:08 struct address_space * mapping 2008-09-30 20:08 we'll be looking at those, yes 2008-09-30 20:08 aaa... this was in some time ago 2008-09-30 20:08 that's what this is all about 2008-09-30 20:08 pgoff_t index; 2008-09-30 20:08 index is for practical purposes unsigned int 2008-09-30 20:08 that means 32 bit on 32 arches 2008-09-30 20:08 interesting that pgoff_t seems to be a page offset, but it would make sense to be a file offset div page_size ? 2008-09-30 20:09 maybe just a misnomer 2008-09-30 20:09 limiting the size of any file to 2^(32 + 12) 2008-09-30 20:09 oh, as in offset into file in pages 2008-09-30 20:09 exactly 2008-09-30 20:10 bad terminology 2008-09-30 20:10 -!- Kirantpatil(~kiran@122.167.219.78) has joined #tux3 2008-09-30 20:10 does this mean files can't be larger than 16TB? 2008-09-30 20:10 (on 32-bit arch) 2008-09-30 20:10 yes 2008-09-30 20:10 that's where that comes from 2008-09-30 20:10 volumes too 2008-09-30 20:10 does any linux filesystem workaround this somehow? 2008-09-30 20:10 volumes? 2008-09-30 20:10 because each volume has a page cache dedicated to non-file pages on the volume, that is, metadata 2008-09-30 20:11 there is no workaround 2008-09-30 20:11 "speed of sound in a 32 bit vacuum" 2008-09-30 20:11 :p 2008-09-30 20:11 :D 2008-09-30 20:11 ok, what does the index index? 2008-09-30 20:11 A: a radix tree 2008-09-30 20:12 let's drill down into find_lock_page, which is used in more than one place thankfully 2008-09-30 20:12 index = pos >> PAGE_CACHE_SHIFT; 2008-09-30 20:12 so how does tux3 scale beyond this? 2008-09-30 20:12 it doesn't? 2008-09-30 20:12 razvanm, good point 2008-09-30 20:13 razvanm, you will see code like that in tux3.c 2008-09-30 20:13 it does not on 32 bit 2008-09-30 20:13 fact of life 2008-09-30 20:13 ACTION sits down in the back row 2008-09-30 20:13 hey shapor 2008-09-30 20:13 hi flips 2008-09-30 20:13 ACTION throws some chalk 2008-09-30 20:13 always wanted to do that 2008-09-30 20:13 is it illegal yet? 2008-09-30 20:14 so you simply can't mount such a large tux3 fs on a 32-bit os? 2008-09-30 20:14 simply can't 2008-09-30 20:14 we'd better produce a nice error though 2008-09-30 20:14 because somebody will try 2008-09-30 20:14 hm 2008-09-30 20:14 to tell the truth, it would not be that big a deal to fix 2008-09-30 20:14 somebody who wants to is welcome 2008-09-30 20:15 pretty easy hack for a great deal of fame 2008-09-30 20:15 ACTION listens for the thundering herd of volunteers 2008-09-30 20:15 ok, let's go to find_lock_page 2008-09-30 20:15 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L661 2008-09-30 20:16 takes a mapping and index 2008-09-30 20:16 mapping is what tux3 calls "map" 2008-09-30 20:16 I called it map because that saved me about 80,000 keystrokes over the life of the project 2008-09-30 20:17 how does a page become a pagecache page? 2008-09-30 20:17 also, a tux3 userspace map maps blocks, whereas linux page cache maps pages 2008-09-30 20:17 shapor, we're looking at that right now 2008-09-30 20:17 somewhere in here we will find an alloc_pages(order 1) 2008-09-30 20:17 order 0 I mean 2008-09-30 20:18 first thing we do is try to find it already in the radix tree, but let's skip that and find out what happens when it's not there 2008-09-30 20:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-09-30 20:18 nothing happens :P 2008-09-30 20:18 exactly 2008-09-30 20:18 this function does not allocate pages 2008-09-30 20:18 alloc_pages(order n) allocates 2^n pages, with linear physical addresses? 2008-09-30 20:19 Returns zero if the page was not present. 2008-09-30 20:19 ok, let's go back up to _2copy and find out where the page is really alloced 2008-09-30 20:19 if we don't find it here 2008-09-30 20:20 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2049 2008-09-30 20:20 status = -ENOMEM; 2008-09-30 20:20 break 2008-09-30 20:20 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2107 2008-09-30 20:20 2049 page = page_cache_alloc(mapping); 2008-09-30 20:21 8-) 2008-09-30 20:21 :D 2008-09-30 20:21 better ;-) 2008-09-30 20:21 right 2008-09-30 20:21 knew it was in there ;) 2008-09-30 20:22 74static inline struct page *page_cache_alloc(struct address_space *x) 75{ 76 return __page_cache_alloc(mapping_gfp_mask(x)); 77} 2008-09-30 20:22 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L500 2008-09-30 20:22 which is just a call to alloc_pages 2008-09-30 20:22 as promised 2008-09-30 20:22 return alloc_pages(gfp, 0); 2008-09-30 20:22 what is the point of page_cache_alloc? 2008-09-30 20:23 some new bs about mapping_gfp_mask 2008-09-30 20:23 calling __page_cache_alloc 2008-09-30 20:23 if (cpuset_do_page_mem_spread()) { 2008-09-30 20:23 for numa 2008-09-30 20:23 shapor, probably little point if you really dig 2008-09-30 20:23 lots of accumlated cruft in there 2008-09-30 20:23 why grabing fails if adding to the lru fails? 2008-09-30 20:23 it's basically a numa-diverse alloc_pages(0) 2008-09-30 20:24 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2057 2008-09-30 20:24 goto repeat; 2008-09-30 20:24 notice it shouldn't fail 2008-09-30 20:24 razvanm, because that got a lot more complex recently 2008-09-30 20:24 let's take a look at it 2008-09-30 20:25 getting well outside the scope of vfs 2008-09-30 20:25 I guess there must be a reason why the page must be in the lru. Is there an obvious one? :P 2008-09-30 20:25 I think every page must be 2008-09-30 20:25 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L459 <- :P 2008-09-30 20:25 otherwise how do you know what to flush on memory low condition? 2008-09-30 20:25 it's about reverse mapping 2008-09-30 20:26 lots of complexity has been added to optimize it 2008-09-30 20:26 oh sorry 2008-09-30 20:26 I was blathering 2008-09-30 20:26 O:-) 2008-09-30 20:26 truth is, it's just a wrapper for the radix tree insert 2008-09-30 20:26 MaZe: to deal with the low memory you need some of the pages to be there :P 2008-09-30 20:27 add to page cache should never, ever fail 2008-09-30 20:27 but it could 2008-09-30 20:27 RazvanM: I think that's with extremely low memory conditions 2008-09-30 20:27 if it does fail we are in deep doodoo 2008-09-30 20:27 :D 2008-09-30 20:27 flips: yeah i see it gets called a few times in that file 2008-09-30 20:27 not jsut extremely low, bug buggy in the kernel bug sense 2008-09-30 20:27 the page could already be there, probably we're not fully locked against smp, and thus could potentially hit this on 2 cpus 2008-09-30 20:27 shapor, yes, this is the main interface to the page cache 2008-09-30 20:28 maze, we are fully locked against smp 2008-09-30 20:28 necessarily 2008-09-30 20:28 so where does the EEXIST check come from? 2008-09-30 20:28 write_lock_irq does that, and turns off interrupts for good measure 2008-09-30 20:29 if the page is already there, somebody needs to tell us 2008-09-30 20:29 MaZe: the page is already in lru, right? 2008-09-30 20:29 not lru 2008-09-30 20:29 radix tree 2008-09-30 20:29 badly named function here 2008-09-30 20:29 very bad 2008-09-30 20:29 it means "add to page cache and also to lru" 2008-09-30 20:29 not add to lru 2008-09-30 20:30 aaaa 2008-09-30 20:30 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L490 2008-09-30 20:30 right, only adding to cache can fail 2008-09-30 20:30 got a rul for the EEXIST test? 2008-09-30 20:30 url? 2008-09-30 20:30 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2055 2008-09-30 20:31 mem_cgroup_uncharge_page <- wow, 75 cent name 2008-09-30 20:31 shapor, this should get a rise out of the hotrodder in you 2008-09-30 20:31 memory accounting for containerization 2008-09-30 20:31 hah 2008-09-30 20:32 nitro chardged pages 2008-09-30 20:32 maze, thanks 2008-09-30 20:32 for? 2008-09-30 20:32 for the comment re containers 2008-09-30 20:32 oh. 2008-09-30 20:32 explains why I haven't seen the beast before 2008-09-30 20:32 crappy name 2008-09-30 20:32 therefore fits nicely ;) 2008-09-30 20:33 cgroup is the containers stuff, both cpu and mem 2008-09-30 20:33 uncharge must be in the release page path, and charge in the alloc path 2008-09-30 20:33 yeah this unfortunately all gets pretty complex 2008-09-30 20:33 because we're supporting numa and containers 2008-09-30 20:34 why does this matter for tux3 2008-09-30 20:34 all right, the EEXIST is about what happens if somebody adds the page while we are waiting to acquire the radix tree lock 2008-09-30 20:34 ok? 2008-09-30 20:34 i thoguht we were intentially avoiding the vm 2008-09-30 20:34 page cache is vfs, not vm 2008-09-30 20:34 numa = non-uniform memory access machines (multi-socket machines) and containers (good for jails/vms/isolating users/apps, etc...) 2008-09-30 20:34 flips: right, hence my comment about not having all the locks in smp 2008-09-30 20:35 shapor, we only need to know to recognize what is mm and therefore can be ignored ;) 2008-09-30 20:35 ok 2008-09-30 20:35 maze, right then 2008-09-30 20:35 as usual ;) 2008-09-30 20:35 you get to run the next class ;) 2008-09-30 20:35 well 2008-09-30 20:35 no... 2008-09-30 20:35 got to wait and see what you hack next 2008-09-30 20:35 ACTION runs away... 2008-09-30 20:35 flips: did you feel the mini quake a while ago? 2008-09-30 20:35 shapor, no, missed it 2008-09-30 20:35 didn't feel anything up here 2008-09-30 20:36 we had a great one a month or two ago 2008-09-30 20:36 got the familly up and huddled under a door jamb 2008-09-30 20:36 anyway 2008-09-30 20:36 life in paradise 2008-09-30 20:36 right 2008-09-30 20:36 duck'n'cover 2008-09-30 20:36 ;-) 2008-09-30 20:36 where were we 2008-09-30 20:37 we've nearly done everything interesting in there 2008-09-30 20:37 yes, we did get sidetracked a little. 2008-09-30 20:37 http://lxr.linux.no/linux+v2.6.26.5/mm/filemap.c#L2055 2008-09-30 20:37 sorry ;) 2008-09-30 20:37 details of radix tree aren't that interesting 2008-09-30 20:37 could we have a little more info on mapping? 2008-09-30 20:37 the channel topic says "and friends" 2008-09-30 20:37 ok, question? 2008-09-30 20:37 who fills it in and what it is for? 2008-09-30 20:37 who fills in the mapping? 2008-09-30 20:37 2066 struct address_space *mapping = file->f_mapping; 2008-09-30 20:38 so it comes from the file, so the vfs? 2008-09-30 20:38 it's just the per-inode page cache 2008-09-30 20:38 vfs usually 2008-09-30 20:38 though filesystem can too, and some do 2008-09-30 20:38 the fs has access to the whole misshapen page cache api 2008-09-30 20:38 for better or worse 2008-09-30 20:38 you will see all the functions are EXPORT()ed 2008-09-30 20:39 not even _GPL 2008-09-30 20:39 you can write evil/fringer binary modules that use this interface 2008-09-30 20:39 fringe 2008-09-30 20:39 ok, did we do who fills it in enough? 2008-09-30 20:40 probably 2008-09-30 20:40 still don't get it ;-) but nevermind 2008-09-30 20:40 then we didn't 2008-09-30 20:40 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L499 2008-09-30 20:40 the mapping _is_ the page cache 2008-09-30 20:40 is where the struct is defined 2008-09-30 20:41 so the page cache 2008-09-30 20:41 it's basically one to one with file inodes 2008-09-30 20:41 is actually not a page cache, but rather a page cache per inode per superblock 2008-09-30 20:41 exactly 2008-09-30 20:41 just per inode 2008-09-30 20:41 as in not _a_ but _one per_ 2008-09-30 20:41 one per inode 2008-09-30 20:41 actually 2008-09-30 20:41 one per file-backed inode 2008-09-30 20:42 okay, it's always talked of as if it was one beast 2008-09-30 20:42 to be precise 2008-09-30 20:42 yes, that's just sloppy 2008-09-30 20:42 non file-backed inode being non-file/dir stuff? (sockets, pipes, symlinks, fifos, devs?) 2008-09-30 20:42 while we're looking at struct address_space (which really should have been called struct mapping) 2008-09-30 20:42 let's look at some of the fields there 2008-09-30 20:43 device inode, socket, etc 2008-09-30 20:43 so 4k per inode? 2008-09-30 20:43 everything is an inode, and not all have caches 2008-09-30 20:43 er at least 2008-09-30 20:43 at least 2008-09-30 20:43 why 4k? 2008-09-30 20:43 inodes are big bloating things, especially when they have all their decorations attached 2008-09-30 20:44 shapor, which 4k were you referring to? 2008-09-30 20:44 at least one page gets allocated, right? 2008-09-30 20:44 size of a page 2008-09-30 20:44 for what? 2008-09-30 20:44 for the struct? 2008-09-30 20:44 shapor, probably not 2008-09-30 20:44 I'm sure we use a slab-cache of some sort? 2008-09-30 20:45 some inodes could lack any page cache, right? 2008-09-30 20:45 ok i read what flips said wrong 2008-09-30 20:45 shapor, look at the inode, you will see there's an address space embedded right in it 2008-09-30 20:45 kind of confusing 2008-09-30 20:45 hmm 2008-09-30 20:45 where is that defined/ 2008-09-30 20:46 624 struct address_space i_data; 2008-09-30 20:46 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L623 2008-09-30 20:46 http://lxr.linux.no/linux+v2.6.26.5/include/linux/fs.h#L624 2008-09-30 20:46 yeah i found it 2008-09-30 20:46 cat /proc/slabinfo - 3rd number means objsize 2008-09-30 20:46 radix_tree_node 21273 21294 560 14 2 : tunables 0 0 0 : slabdata 1521 1521 0 2008-09-30 20:46 bdev_cache 43 63 768 21 4 : tunables 0 0 0 : slabdata 3 3 0 2008-09-30 20:46 sysfs_dir_cache 11921 12189 80 51 1 : tunables 0 0 0 : slabdata 239 239 0 2008-09-30 20:46 inode_cache 4952 4970 568 14 2 : tunables 0 0 0 : slabdata 355 355 0 2008-09-30 20:46 dentry 486172 486172 208 19 1 : tunables 0 0 0 : slabdata 25588 25588 0 2008-09-30 20:47 razvanm, that's the pointer to it, which points into the address space itself, immediately after 2008-09-30 20:47 why the mapping has to be a pointer isn't clear to me 2008-09-30 20:47 probably bogus 2008-09-30 20:47 flips: that was what I about to ask :P 2008-09-30 20:47 razvanm, it's the homework assignment then, to find out by thursday 2008-09-30 20:47 in case you don't want that one? 2008-09-30 20:48 maze, sure, but what use case? 2008-09-30 20:48 share across many inodes? 2008-09-30 20:48 I suspect a highly dogdy one 2008-09-30 20:48 like cow 2008-09-30 20:48 aaa... hard links? :P 2008-09-30 20:48 nah, hard links -> same inode 2008-09-30 20:48 never assume that what we do in kernel actually makes sense ;) 2008-09-30 20:48 soft link? :-) 2008-09-30 20:48 often things are the way they are just because they are 2008-09-30 20:48 and that is eventually proven when somebody rips it out and changes it completely 2008-09-30 20:49 616 spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ 2008-09-30 20:49 how can a lock maybe protect a field? 2008-09-30 20:49 maybe not everybody follow the rules? 2008-09-30 20:49 there's a lot of weirdness around i_size locking wise 2008-09-30 20:50 maze, likely more bogosity, and the cause of thousands of hours worth of bug chasing the last few years 2008-09-30 20:50 anyway, so our inode, including the mapping, seems to use about 570 bytes 2008-09-30 20:50 jsut for starters 2008-09-30 20:50 then you get a couple of pages linked to it... dentries... it gets bloaty 2008-09-30 20:50 inode_cache 162 264 340 11 1 : tunables 54 27 8 : slabdata 24 24 0 2008-09-30 20:51 dentry per hardlink to the inode right? 2008-09-30 20:51 per path element used to open the inode 2008-09-30 20:51 + radix_tree_nodes (and actual pages) for the parts that are in memory? 2008-09-30 20:51 right, but those higher levels are shared 2008-09-30 20:51 n directory names and one file name 2008-09-30 20:51 maybe 2008-09-30 20:52 ? 2008-09-30 20:52 not necessarily 2008-09-30 20:52 except for root 2008-09-30 20:52 you can easily have lots of very unshared paths 2008-09-30 20:52 like in a java class tree 2008-09-30 20:52 bushy 2008-09-30 20:52 in what sense not necessarily, as in there may not be any other files using the same prefix? 2008-09-30 20:52 right 2008-09-30 20:52 or we can have the same prefix and still not share? 2008-09-30 20:52 ah, ok 2008-09-30 20:52 right, of course, then 2008-09-30 20:53 but dentries are pretty small (200 bytes) 2008-09-30 20:53 just by way of showing that the average pinned cache per inode can be quite large 2008-09-30 20:53 you also typically have a struct file 2008-09-30 20:53 if its open 2008-09-30 20:53 right? 2008-09-30 20:53 so file->dentry->inode->pages 2008-09-30 20:53 yes 2008-09-30 20:54 all the dentries up to the root and the inode, and the radix tree have to be in ram, as long as we are using the page cache from that inode, right? 2008-09-30 20:54 destroyed when closed always? 2008-09-30 20:54 when stuff is just hanging in cache you have dentry->inode->pages 2008-09-30 20:54 that's with a closed file? 2008-09-30 20:54 struct file always destroyed on close, dentry not 2008-09-30 20:54 yes 2008-09-30 20:55 right, but eventually it would get evicted, since the close would flush? 2008-09-30 20:55 maze, close doesn't flush 2008-09-30 20:55 only umount evicts like that 2008-09-30 20:55 really? 2008-09-30 20:55 well 2008-09-30 20:55 close does not evict 2008-09-30 20:55 I thought close was guaranteed to give you back any error messages 2008-09-30 20:55 or flush actually 2008-09-30 20:55 in case you were to run out of disk, etc 2008-09-30 20:56 and thus close had to wait for a flush? 2008-09-30 20:56 hmm 2008-09-30 20:56 doubt that 2008-09-30 20:56 if close was equivalent to fsync performance would tank 2008-09-30 20:57 so close may be defined that way, the filesystem does not have to implement it that way 2008-09-30 20:57 hm but if dentries get purged from you dont lose the cache right? 2008-09-30 20:57 man 2 open 2008-09-30 20:57 It is quite possible that errors on a pre- 2008-09-30 20:57 vious write(2) operation are first reported at the final close(). Not 2008-09-30 20:57 checking the return value when closing the file may lead to silent loss 2008-09-30 20:57 of data. This can especially be observed with NFS and with disk quota. 2008-09-30 20:57 A successful close does not guarantee that the data has been success- 2008-09-30 20:57 fully saved to disk, as the kernel defers writes. It is not common for 2008-09-30 20:57 a filesystem to flush the buffers when the stream is closed. If you 2008-09-30 20:57 need to be sure that the data is physically stored use fsync(2). (It 2008-09-30 20:57 will depend on the disk hardware at this point.) 2008-09-30 20:57 shapor, dentries stay around as long as the cache does 2008-09-30 20:57 so, close can return errors, but it still doesn't flush unless you fsync 2008-09-30 20:57 right 2008-09-30 20:58 cute 2008-09-30 20:58 It is probably unwise to close file descriptors while they may be in 2008-09-30 20:58 use by system calls in other threads in the same process. Since a file 2008-09-30 20:58 descriptor may be re-used, there are some obscure race conditions that 2008-09-30 20:58 may cause unintended side effects. 2008-09-30 20:58 two minutes 2008-09-30 20:59 my girl has decided daddy needs to play with her 2008-09-30 20:59 :-) 2008-09-30 20:59 she doesn't like the linux kernel (yet)? 2008-09-30 20:59 maze, we have fixed most of those races 2008-09-30 20:59 couple were fixed this year 2008-09-30 20:59 not yet 2008-09-30 20:59 I like the sound of confidence there... 2008-09-30 20:59 working on it 2008-09-30 20:59 ...most... 2008-09-30 20:59 file table is a nasty thing 2008-09-30 20:59 race wise 2008-09-30 21:00 but yes, the known holes are closed now 2008-09-30 21:00 remember I told you fget_light was the most perverse function in the kernel? 2008-09-30 21:00 interesting that closing an fd drops locks on the file even if you have duped it to another fd 2008-09-30 21:01 indeed 2008-09-30 21:01 second homework is to find out why 2008-09-30 21:01 ok, first home work was why both ptr and struct address_space (*i_mapping and i_data) in struct inode 2008-09-30 21:01 right 2008-09-30 21:02 well we did grab_cache_page pretty well, did not get to the friends 2008-09-30 21:02 gives a starting point for next time if we want 2008-09-30 21:07 can lxr do regexp search 2008-09-30 21:17 coda, raw, bdev 2008-09-30 21:17 quantum electro-dynamics 2008-09-30 21:18 ACTION goes to bed 2008-09-30 21:19 lol 2008-09-30 21:23 -!- Kirantpatil(~kiran@122.167.219.78) has left #tux3 2008-09-30 21:27 yeah, I typed /nick instead of /me in '/nick says thanks for the lesson :D' 2008-09-30 21:27 I've looked at fget_light, and it doesn't seem that scary... 2008-09-30 21:27 something's wrong with me 2008-09-30 21:27 or I'm missing the point 2008-09-30 21:28 or both ;-) 2008-09-30 21:33 linux-2.6.26.5$ egrep -rn -C 2 "[-][>]i_mapping *=" . 2008-09-30 21:47 folks 2008-09-30 21:47 hmm 2008-09-30 21:49 -!- Kirantpatil(~kiran@122.167.219.78) has joined #tux3 2008-09-30 21:50 -!- Kirantpatil(~kiran@122.167.219.78) has left #tux3 2008-09-30 21:55 maze, you haven't spotted it yet 2008-09-30 21:56 hmm? the fact it uses rcu? and doesn't always increment usage counters, nor does it always call fput? 2008-09-30 21:56 oh does it use rcu now? 2008-09-30 21:56 yeah 2008-09-30 21:57 struct files_struct *files = current->files; 2008-09-30 21:57 is protected by rcu 2008-09-30 21:57 although I'm guessing in many cases the cu is partial and not full 2008-09-30 21:57 fget itself isn't though 2008-09-30 21:58 "You can use this only if it is guranteed that the current task already 2008-09-30 21:58 fget is copied verbatim into fget_light 2008-09-30 21:58 313 * holds a refcnt to that file. That check has to be done at fget() only 2008-09-30 21:58 314 * and a flag is returned to be passed to the corresponding fput_light()" 2008-09-30 21:58 in other words, if the current task drops its reference... well it can't 2008-09-30 21:59 and there is no way an external observer can tell that the file is held by fget_light 2008-09-30 21:59 for starters 2008-09-30 21:59 right 2008-09-30 21:59 hence the 'doesn't always increment usage counters' 2008-09-30 21:59 but since, you're already holding the refcnt, it doesn't matter 2008-09-30 21:59 hence line 312 2008-09-30 22:00 and in cases where that doesn't work (threads), it falls back to using full fget 2008-09-30 22:00 oh wait, it does use rcu now 2008-09-30 22:00 used to be much worse 2008-09-30 22:01 wait again 2008-09-30 22:01 it uses rcu on the slow path 2008-09-30 22:02 right 2008-09-30 22:02 but the slow path is actually pretty common 2008-09-30 22:02 -> threads 2008-09-30 22:02 or anything that through clone ended up with shared fd table 2008-09-30 22:04 if the fd table isn't shared, then there is no need for locking, since it's local to this task 2008-09-30 22:04 and we're running in this tasks context 2008-09-30 22:04 /as this task/ 2008-09-30 22:05 otherwise, we need to synchronize via rcu with other tasks which share our fd table 2008-09-30 22:06 it may become shared 2008-09-30 22:06 after the fget_light 2008-09-30 22:06 nope 2008-09-30 22:06 notice the comment 2008-09-30 22:07 cannot be used if clone before fput_light 2008-09-30 22:07 315 * There must not be a cloning between an fget_light/fput_light pair. 2008-09-30 22:07 that's basically the only case were you are not allowed to use fget_light 2008-09-30 22:07 starting to see the perversity? 2008-09-30 22:07 hmm? 2008-09-30 22:07 doesn't seem perverse 2008-09-30 22:07 seems pretty clean 2008-09-30 22:08 oh... hmm 2008-09-30 22:08 I'm just worried that it may not be worth the effort with how multithreaded nowadays everything is getting 2008-09-30 22:09 (of course threads, don't necessarily share fd tables, but in most languages they probably do) 2008-09-30 22:09 isn't tux3 the works of satan ? 2008-09-30 22:09 I thought rcu was supposed to be pretty efficient... I wonder how much this gains in a single-thread case 2008-09-30 22:10 ACTION reads the backlog 2008-09-30 22:10 what I was thinking 2008-09-30 22:11 flips: you saw? linux-2.6.26.5$ egrep -rn -C 2 "[-][>]i_mapping *=" . --> blockdev, raw char dev, coda -> basically my guess was right 2008-09-30 22:11 ah, didn't notice you were already doing the challenge 2008-09-30 22:12 raw char dev maps on block dev, so remaps mapping to blockdevs mapping to share page cache, coda does hackery in case localfs is exported (AFAICT) 2008-09-30 22:12 and then re-imported via coda 2008-09-30 22:13 to share page cache between the codafs import and the original export 2008-09-30 22:13 at least, that's my guess 2008-09-30 22:13 how did you guess raw char dev, coda? 2008-09-30 22:14 linux-2.6.26.5$ egrep -rn -C 2 "[-][>]i_mapping *=" . 2008-09-30 22:14 ah, a computerized guess 2008-09-30 22:14 not really a guess 2008-09-30 22:14 so the short answer is: when the cache must be shared between inodes 2008-09-30 22:14 the guess was earlier, when I said multiple inodes with the same mapping 2008-09-30 22:14 but it isn't clear whether the sharing cases are valid 2008-09-30 22:15 the coda case at least isn't clear 2008-09-30 22:15 now what about the raw char dev? 2008-09-30 22:15 why should that have a cache at all? 2008-09-30 22:15 raw char dev is basically opening block dev with O_DIRECT 2008-09-30 22:15 and is the ancient way to do it 2008-09-30 22:15 oh 2008-09-30 22:15 raw dev 2008-09-30 22:15 so the raw char dev case maps in the mapping from the block dev 2008-09-30 22:16 seems wrong somehow 2008-09-30 22:16 in what sense? 2008-09-30 22:16 why doesn't it just return the device inode? 2008-09-30 22:16 use that when you open the raw device 2008-09-30 22:16 probably because it's a raw char not a block dev 2008-09-30 22:16 and behaviour is different? 2008-09-30 22:17 so can't get there from here maybe 2008-09-30 22:17 hmm 2008-09-30 22:17 or the raw char dev should point at the device inode 2008-09-30 22:17 not at the mapping 2008-09-30 22:18 you'll note that for some reason it's not a raw block dev, but a raw char dev, so probably alignment issues and ioctls force it to have a shim-layer 2008-09-30 22:18 right, but it is not clear it can't reference the block device inode 2008-09-30 22:18 haven't looked at that thing at all 2008-09-30 22:18 some sort of ancientness, nowadays raw char devs are close to getting dropped I think 2008-09-30 22:18 always used o_direct instead 2008-09-30 22:19 exactly 2008-09-30 22:20 coda... who knows 2008-09-30 22:20 doing stacking on a vfs that wasn't designed for it is going to be fun 2008-09-30 22:24 http://lkml.org/lkml/2003/5/2/157 2008-09-30 22:24 (maze) 2008-09-30 22:33 hmm 2008-09-30 22:33 coda... who knows - not really 2008-09-30 22:33 it's a network filesystem with local caching and offline operation 2008-09-30 22:33 pretty obvious it needs to tie the codafs inodes with the local backing store inodes 2008-09-30 22:35 the one you found is just an inlining of fput_light 2008-09-30 22:36 or rather of the first if in it 2008-09-30 22:37 sorry, me bad 2008-09-30 22:38 earlier on in that thread 2008-09-30 22:38 and even then those comparisons are before rcu 2008-09-30 22:40 so not clear what the gain is with rcu 2008-09-30 22:40 as opposed to normal r/w locks like before 2008-09-30 23:04 maze, what is not obvious is why it can't keep references to the backing store inodes 2008-09-30 23:05 instead of sharing the inode's cache with its own inodes 2008-09-30 23:07 then there is assoc_mapping 2008-09-30 23:12 assoc_mapping is only used by sync_mapping_buffers, which is only used by brainless filesystems like ext2 2008-09-30 23:12 is used incorrectly it would seem in reiserfs 2008-09-30 23:13 ocfs2 also uses it, perhaps because mark overlooked it 2008-09-30 23:14 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-09-30 23:35 -!- Kirantpatil(~kiran@122.166.169.45) has joined #tux3 2008-09-30 23:35 -!- Kirantpatil(~kiran@122.166.169.45) has left #tux3 2008-09-30 23:39 I'm guessing it does keep reference to the backing store inodes 2008-09-30 23:39 but I'm also guessing that it wants access to the coda file to hit the same pagecache as the backing file, isn't that easiest to do by having the same pagecache, by having the i_mapping pointer point to the same location? 2008-10-01 00:00 I'd think just has easy to redirect each sys_* call to a different inode 2008-10-01 00:00 well I haven't tried it 2008-10-01 00:00 have not looked nearly deep enough to know 2008-10-01 00:01 one day when there's nothing else to do 2008-10-01 00:01 the arrangement does look a little less than elegant 2008-10-01 00:02 kay, time to explain why extents just go so much work 2008-10-01 00:25 flips: is the 16 terabyte limitation for a file a problem ? 2008-10-01 00:25 I can image some situation where a DB might need a file that size or larger 2008-10-01 00:26 yes, it's uncomfortably small at a time where 1TB disks are $100 and it doubles every 18 months 2008-10-01 00:26 let alone storage arrays 2008-10-01 00:27 in six years, $100 will get you 16TB 2008-10-01 00:31 -!- Kirantpatil(~kiran@122.167.192.43) has joined #tux3 2008-10-01 00:32 http://lwn.net/Articles/194869/ <- the original patch for extents in ext4 2008-10-01 00:34 -!- bobby(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-10-01 00:35 hello 2008-10-01 00:35 anyone here? 2008-10-01 00:35 flips: how's tux3 going overall ? well IYO ? 2008-10-01 00:35 hi pranith 2008-10-01 00:35 hello flips 2008-10-01 00:35 it's getting more like a filesystem every day 2008-10-01 00:35 seems like extents were a bitch to pull together, bug fixing and all 2008-10-01 00:36 indeed 2008-10-01 00:36 pranith, I will probably put the fuse ls fix in tomorrow or the next day 2008-10-01 00:36 flips, the groups table and second level entry table are stored in reverse. any reason for that? 2008-10-01 00:36 flips, you found the problem? 2008-10-01 00:37 pranith, no but I will 2008-10-01 00:37 :) 2008-10-01 00:37 ok 2008-10-01 00:37 flips: that implementation looked pretty big and complicated 2008-10-01 00:37 pranith, the groups and entries grow down towards the extent table, therefore it is more efficient to store them in reverse 2008-10-01 00:38 that way, appending a new entry at the end does not require moving all the other entries 2008-10-01 00:38 and the computer doesn't care which way they are stored, only the poor unfortunately programmers who have to try to understand that code 2008-10-01 00:39 i dont understand what u mean by reverse here... 2008-10-01 00:39 the entries are growing from top to bottom 2008-10-01 00:39 yes, I call that reverse 2008-10-01 00:39 hmm 2008-10-01 00:40 isn't that the usual way to do it? 2008-10-01 00:40 top to bottom 2008-10-01 00:40 flips: in general, it's a good thing that all of these file systems are getting development 2008-10-01 00:40 btrfs, ext3 and tux3 2008-10-01 00:40 pranith, I don't know about usual, but btrfs does something similar 2008-10-01 00:40 flips, ohk 2008-10-01 00:40 maybe one of then can get us out of the Linux file system dark ages 2008-10-01 00:40 then=them 2008-10-01 00:40 bh, I am awed by the epic nature of the ext4 extents implementation 2008-10-01 00:41 I am sure there are good reasons for everything 2008-10-01 00:41 flips: it's huge, do you think it needs to be that way ? 2008-10-01 00:41 struct dleaf { u16 magic, free, used, groups;struct extent table[];} 2008-10-01 00:41 well, mine is about 400 lines, maybe has a 100 lines more to go 2008-10-01 00:41 so no 2008-10-01 00:41 maybe they're working around legacy issues or something with that allocator, don't know unless I look at the patch more carefully 2008-10-01 00:42 flips, magic? 2008-10-01 00:42 the magic number is to do an in-memory check that you really have got a dleaf when you think you have 2008-10-01 00:42 ACTION takes a longer look at the patch 2008-10-01 00:42 where are the entries in the dleaf struct? 2008-10-01 00:43 I think the magic number is 0x1eaf 2008-10-01 00:43 i mean u dint use struct group and struct entry there 2008-10-01 00:43 pranith, extents at the bottom of the leaf are indexed by dictionary entries at the top of the leaf 2008-10-01 00:44 ACTION looking at the code 2008-10-01 00:44 a picture would be most helpful 2008-10-01 00:45 anybody good with graphics? 2008-10-01 00:45 flips, ohk, these dict entries are formed in dwalk_probe 2008-10-01 00:45 dwalk_probe does a lookup 2008-10-01 00:46 they are created by dwalk_pack 2008-10-01 00:46 that's for the extent code 2008-10-01 00:46 ok, looking at that 2008-10-01 00:46 the existing pointer code uses dleaf_lookup to find an entry and dleaf_resize to create space for a new one 2008-10-01 00:47 ok 2008-10-01 00:51 oh look, ext4 has an extent walk too: ext4_ext_walk_space 2008-10-01 00:53 and uses the word "gap" 2008-10-01 00:54 bh, I think the reason the ext4 extent code is so long is, it actually includes all the ext4 btree indexing code too 2008-10-01 00:55 I'm looking over the patch right now 2008-10-01 00:55 A good chunk of the complexity is with tree manipulation which you largely have already in your b-tree implementation I believe 2008-10-01 00:56 flips, leaf_free function is a bit confusing.. shouldn't that just return to_dleaf(leaf)->free?? 2008-10-01 00:56 there's an extent merge operation which I don't know yet where it's used and how 2008-10-01 00:56 pranith, no, I used some confusing field names there 2008-10-01 00:57 flips: I didn't see the b-tree stuff in there, but I saw a bunch of stuff that looked like generic tree manipulation, no b-tree 2008-10-01 00:57 flips, whats free for? and whats used for? 2008-10-01 00:57 I'm about 2/5th through so far, nearing the halfway point 2008-10-01 00:57 dleaf->free is the offset within the dleaf of the top of the extents table, dleaf->used is offset of the bottom of the entries list 2008-10-01 00:57 bh, truee 2008-10-01 00:57 the "extent tree" does not appear to be a btree 2008-10-01 00:58 pranith, it's used to know when there is space to add a new extent to the leaf, or if the leaf has to be split 2008-10-01 00:58 hmm 2008-10-01 01:01 leaf->groups contains the number of groups currently in the leaf? 2008-10-01 01:01 yes 2008-10-01 01:02 flips: they have more operations for modifying the extents, but that's about it. It doesn't look that bad regarding complexity 2008-10-01 01:02 -!- _ajonat(~ajonat@190.48.125.115) has joined #tux3 2008-10-01 01:02 does tux3 have similar operations yet ? 2008-10-01 01:02 something like that 2008-10-01 01:02 something more powerful imho 2008-10-01 01:02 flips: I think it's just about general manipulation of file system data structures, nothing more than that 2008-10-01 01:03 looks like 2008-10-01 01:03 it's a lot of code 2008-10-01 01:03 hard to understand the structure in one reading 2008-10-01 01:03 no comments... 2008-10-01 01:03 I think it's just because of it's legacy roots with ext3 which is what this patch is aimed at work around 2008-10-01 01:03 flips: it wasn't hard, you'd be able to get through it in a concentrated day 2008-10-01 01:03 there's only one really large core routine 2008-10-01 01:04 but I got the general idea of what was going on 2008-10-01 01:05 if tux3 uses the b-tree for this already then you don't have to go through the process of "fitting" extents to the disk, etc... 2008-10-01 01:05 and other optimizations with allocation 2008-10-01 01:06 I wonder how they get away with 32 bits for the logical block number of the extent 2008-10-01 01:06 the only worry about tux3 that I have is the lack of kernel support since there's logic manipulation buffer states, ec... 2008-10-01 01:07 the tux3 buffer cache emulation is quite close to the real thing 2008-10-01 01:08 maybe they don't want to have large an extent ? 2008-10-01 01:09 bh, I think they do have a btree in there 2008-10-01 01:09 I saw generic extent manipulation stuff in there 2008-10-01 01:09 no b-tree that I immediately saw 2008-10-01 01:09 there is this notion of an extent cache, I wonder what that is about 2008-10-01 01:10 generally, where you see "path" variables there is a btree 2008-10-01 01:11 and not another on-disk structure ? 2008-10-01 01:13 there are lead manipulation routines but it didnt look like they were manipulating a b-tree 2008-10-01 01:14 +err = ext4_ext_insert_extent(handle, inode, path, &newex); 2008-10-01 01:14 the "inode" parameter makes me think it's against an inode tree instead of a b-tree 2008-10-01 01:17 bh, see ext4_ext_show_path, it's walk a structure that looks much like a btree 2008-10-01 01:17 probing I mean 2008-10-01 01:18 showing 2008-10-01 01:19 it's a very generic routine 2008-10-01 01:19 well look at /* walk through the tree */ 2008-10-01 01:19 it's probing a btree 2008-10-01 01:20 ext4_ext_path 2008-10-01 01:20 What's backing that struct ? 2008-10-01 01:21 blocks 2008-10-01 01:21 It's a path 2008-10-01 01:21 they're written out via mark_buffer_dirty_inode 2008-10-01 01:21 right ? 2008-10-01 01:22 question is what is a path in this code ? 2008-10-01 01:22 that's an in-memory structure 2008-10-01 01:22 just for keeping track of position in the btree 2008-10-01 01:23 no question it's a btree 2008-10-01 01:23 funny that word doesn't appear in the code 2008-10-01 01:23 don't know, I have to stop looking at the path 2008-10-01 01:23 flips: correct 2008-10-01 01:24 ...patch 2008-10-01 01:24 me too 2008-10-01 01:24 that put me in a bad mood ;) 2008-10-01 01:24 too much code 2008-10-01 01:24 it's as big as all of tux3 at the moment 2008-10-01 01:26 http://www.phoronix.com/forums/showthread.php?t=1765 <- fs benchmarks from 2006 2008-10-01 01:26 reiser4 wins apparently 2008-10-01 01:28 actually, if you read it, ext2 wins, and ext4 not far behind 2008-10-01 01:28 marginally beats reiser4 at tar -c and -x 2008-10-01 01:29 kills reiser on delete 2008-10-01 01:29 so to speak 2008-10-01 01:34 hes already dead ;) 2008-10-01 02:07 -!- Kirantpatil(~kiran@122.167.192.43) has left #tux3 2008-10-01 02:25 flips: as long as tux3 has the same kind of stuff you're set 2008-10-01 02:25 or different stuff that works as well or better 2008-10-01 02:26 the goot thing about a b-tree being the center of the universe is that implementations like that can be done in terms of it 2008-10-01 02:26 good 2008-10-01 02:27 which is what you've done 2008-10-01 02:27 it's better to be efficient than to use a brute force method to building a system if possible 2008-10-01 02:28 pretty much need to have a btree in order to have extents 2008-10-01 02:28 ACTION still needs to look at dleaf.c more  2008-10-01 02:29 well, good luck with it 2008-10-01 02:37 ACTION is just about to hit the bed 2008-10-01 02:37 getting late and it's rally hot today/tonight 2008-10-01 02:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-01 02:55 hey daddy 2008-10-01 02:59 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-01 03:45 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 04:35 Fire! 2008-10-01 04:35 there goes another tux3 report off into the wild blue yonder 2008-10-01 04:41 flips: from where do you have the numbers? 2008-10-01 04:42 for the comparisons? 2008-10-01 04:42 which ones? 2008-10-01 04:42 ext3 vs tux3 with regard to deletion e.g. 2008-10-01 04:42 simple arithmetic 2008-10-01 04:43 average seek time: 6 ms 2008-10-01 04:43 transfer speed: 64 MB/sec 2008-10-01 04:43 ok, so you calculated it. did you account for the overhead of more complex computations for extents? Or are they nothing to worry about (probably not, regarding how slow disks are) 2008-10-01 04:43 1K pointers/4K block for ext2 comes from the 4 byte pointer size 2008-10-01 04:44 no, I did not add in the cpu overhead 2008-10-01 04:44 but cpu overhead is actually less 2008-10-01 04:44 well, less cases to look at 2008-10-01 04:44 because many simple operations are replaced by one complex operation 2008-10-01 04:45 will you do another post about the extent design? 2008-10-01 04:45 yes 2008-10-01 04:45 maybe tomorrow, maybe later 2008-10-01 04:46 I guess I will get it working the rest of the way first 2008-10-01 04:46 there are still a couple of biggish things to do 2008-10-01 04:46 have to put in the leaf splitting 2008-10-01 04:47 and the actual IO 2008-10-01 04:47 and the extent reading 2008-10-01 04:48 sounds like the better plan. have something to show :) 2008-10-01 04:48 agreed 2008-10-01 04:48 and even better plan: me get some sleep 2008-10-01 04:49 i was just wondering if you were still awake or already 2008-10-01 04:49 still 2008-10-01 04:49 not for very much longer 2008-10-01 04:50 ok. i'll get going too 2008-10-01 04:50 see you 2008-10-01 04:50 bye. 2008-10-01 04:51 -!- bobby(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-10-01 05:33 -!- Kirantpatil(~kiran@122.167.192.43) has joined #tux3 2008-10-01 05:33 -!- Kirantpatil(~kiran@122.167.192.43) has left #tux3 2008-10-01 06:05 flips, congrats on completing(partially even) extents 2008-10-01 06:24 -!- Bobby(~Bobby@nat-inn.mentorg.com) has joined #tux3 2008-10-01 06:25 hello all 2008-10-01 06:57 -!- Kirantpatil(~kiran@122.167.192.43) has joined #tux3 2008-10-01 06:57 -!- Kirantpatil(~kiran@122.167.192.43) has left #tux3 2008-10-01 08:37 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-01 09:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-01 11:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-01 11:55 flips: ping 2008-10-01 12:02 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 12:15 tim_dimm, pong 2008-10-01 12:21 dang- now she's eating again 2008-10-01 12:21 20 minutes 2008-10-01 12:21 life in the daddy zone 2008-10-01 12:24 -!- Bobby(~Bobby@122.162.68.20) has joined #tux3 2008-10-01 12:24 hey all 2008-10-01 12:25 hi pranith 2008-10-01 12:25 hello flips 2008-10-01 12:25 was reading the ext2 chapter 2008-10-01 12:26 from utlk 2008-10-01 12:26 good thing to do 2008-10-01 12:26 its got some great info 2008-10-01 12:26 utlk is pretty good 2008-10-01 12:26 yup, but pretty outdated i guess 2008-10-01 12:26 he says that ext3 still in dev.. 2008-10-01 12:26 :D 2008-10-01 12:27 :) 2008-10-01 12:27 things actually change pretty slowly in kernel 2008-10-01 12:27 I wonder if he covers reverse mapping 2008-10-01 12:27 and rcu 2008-10-01 12:27 don't think either 2008-10-01 12:28 hmm 2008-10-01 12:28 rcu? 2008-10-01 12:28 weird new locking method 2008-10-01 12:28 :( 2008-10-01 12:28 ohk 2008-10-01 12:28 read/copy/update 2008-10-01 12:29 ohk, found this http://lwn.net/Articles/174641/ 2008-10-01 12:30 ACTION is going to miss this place for the next few days 2008-10-01 12:30 :( out on a trip 2008-10-01 12:30 that's a use of rcu 2008-10-01 12:30 by the way 2008-10-01 12:30 hmm 2008-10-01 12:30 next tuesday I will be away 2008-10-01 12:31 oh 2008-10-01 12:31 maybe maze can do it 2008-10-01 12:31 hmm 2008-10-01 12:31 ok 2008-10-01 12:31 should give him some warning 2008-10-01 12:31 time to study up on something ;) 2008-10-01 12:31 okies 2008-10-01 12:31 flips, one thing 2008-10-01 12:31 you've any idea of aio in the current linux system? 2008-10-01 12:32 i tried it today.. 2008-10-01 12:32 yes 2008-10-01 12:32 din't seem to work... 2008-10-01 12:32 is it _properly_ supported now? 2008-10-01 12:32 yes 2008-10-01 12:32 I have a simple demo somewhere 2008-10-01 12:32 glibc uses the kernel implementation?? 2008-10-01 12:33 yes 2008-10-01 12:33 well 2008-10-01 12:33 it's confusing 2008-10-01 12:33 hmm 2008-10-01 12:33 two different interfaces 2008-10-01 12:33 I can never remember which is which 2008-10-01 12:33 and both complex 2008-10-01 12:33 :) 2008-10-01 12:33 ok 2008-10-01 12:33 irritating to use 2008-10-01 12:33 aio is in general irritating 2008-10-01 12:34 hmm 2008-10-01 12:34 what do u suggest if we had to write say 200mb of data 2008-10-01 12:34 and got some processing to do later 2008-10-01 12:34 i think aio would be perfect for this 2008-10-01 12:34 just dump it and continue with your work 2008-10-01 12:35 and finally check the status and wait till it is completed 2008-10-01 12:35 flips: done 2008-10-01 12:35 instead of waiting forever for the data to be written 2008-10-01 12:35 hey tim_dimm 2008-10-01 12:35 done with what? :) 2008-10-01 12:35 pranith, http://groups.google.com/group/zumastor/browse_thread/thread/7b0f5350a99c0d7d/c0bd2f4d698b3ad6?fwc=1 2008-10-01 12:35 hey pranith 2008-10-01 12:36 thnx flips 2008-10-01 12:36 done with feeding my very fussy 3 week old daughter 2008-10-01 12:36 pranith, basically just copied from the aio man page and cleaned up a little 2008-10-01 12:36 okies 2008-10-01 12:39 ok guys, c u on sunday 2008-10-01 12:39 tata 2008-10-01 12:58 Hello :D !!! 2008-10-01 13:09 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-01 13:14 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 13:43 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-01 14:18 -!- ajonat(~ajonat@190.48.125.115) has joined #tux3 2008-10-01 14:18 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 14:20 folks 2008-10-01 14:21 flips: where the anouncement btw ? 2008-10-01 14:21 you might lke to put the update in the topic 2008-10-01 14:25 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-01 14:25 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 p.m. Pacific Time ~ Next session: friends of grab_cache_page" 2008-10-01 14:25 -!- ChanServ changed mode/#tux3 -> -o flips 2008-10-01 14:35 -!- kbingham(~kbingham@92.19.37.221) has joined #tux3 2008-10-01 14:41 flips: tux3 email posting I mean 2008-10-01 14:41 which one? 2008-10-01 14:41 the most recent tux3 update 2008-10-01 14:42 you were up last last nigh 2008-10-01 14:42 night 2008-10-01 14:49 http://tux3.org/tux3 2008-10-01 16:05 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 16:10 flips : ever decided to put some CMS/CSS :P ? 2008-10-01 16:10 orgthingy, sure, shapor was going to 2008-10-01 16:11 css? 2008-10-01 16:11 if cms, then i really advice to use b2evolution 2008-10-01 16:11 had experience with it in my old bluecirclet.com site 2008-10-01 16:11 we newed DB at bluecirclet.com :( 2008-10-01 16:11 well, it was my partner's fault.. 2008-10-01 16:11 :P 2008-10-01 16:12 check out shapor's site, linked from tux3.org 2008-10-01 16:12 k 2008-10-01 16:12 http://www.totalnetsolutions.net/2007/08/13/how-to-increase-battery-life-in-ubuntu-or-debian-linux/ 2008-10-01 16:12 interesting 2008-10-01 16:13 why is b2evolution better than other php blogging packages? 2008-10-01 16:13 flips : because it was my fav.. i tried many 2008-10-01 16:13 it was probably (one of) best i tried 2008-10-01 16:14 but some like WordPress are fine as well 2008-10-01 16:14 couple of reasons for it being your fav? 2008-10-01 16:19 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-01 16:19 ACTION thinks plain text files are the best :P 2008-10-01 16:19 spot the bug, it has been in since forever: 2008-10-01 16:19 int ileaf_check(BTREE, struct ileaf *leaf) 2008-10-01 16:19 { 2008-10-01 16:19 char *why; 2008-10-01 16:19 why = "not an inode table leaf"; 2008-10-01 16:19 if (leaf->magic != 0x90de) 2008-10-01 16:19 goto eek; 2008-10-01 16:19 why = "dict out of order"; 2008-10-01 16:20 if (!isinorder(btree, leaf)) 2008-10-01 16:20 goto eek; 2008-10-01 16:20 return 0; 2008-10-01 16:20 eek: 2008-10-01 16:20 printf("%s!\n", why); 2008-10-01 16:20 return -1; 2008-10-01 16:20 } 2008-10-01 16:20 well 2008-10-01 16:20 I gave it away 2008-10-01 16:20 that's the fixed version 2008-10-01 16:20 if (leaf->magic != 0x90de); 2008-10-01 16:20 goto eek; 2008-10-01 16:20 if (leaf->magic != 0x90de); 2008-10-01 16:20 goto eek; 2008-10-01 16:20 ; ?!? 2008-10-01 16:21 :D 2008-10-01 16:21 nobody noticed, and nobody noticed the error that has been printed out every time make tests is run 2008-10-01 16:22 21 days till I have time for FS again :| 2008-10-01 16:23 what happens then? 2008-10-01 16:23 out of jail? 2008-10-01 16:26 it's pretty 2008-10-01 16:26 I might run it just for laughs 2008-10-01 16:27 better that the crap on tux3.org now 2008-10-01 16:27 thats the best part about bash blogger 2008-10-01 16:27 it just pre renders static html/css 2008-10-01 16:28 so no added security issues either 2008-10-01 16:28 and faster 2008-10-01 16:28 the way it ought to be 2008-10-01 16:29 I noted that 2008-10-01 16:29 plus it has attitude 2008-10-01 16:33 i've been meaning to set it up for the ugly page which is shapor.com 2008-10-01 16:41 shapor : hello 2008-10-01 16:42 has attitude? 2008-10-01 16:42 lol 2008-10-01 16:43 on 22 is the IPSN deadline 2008-10-01 16:44 till then it will be worse and worse :P 2008-10-01 16:44 timewise speaking 2008-10-01 16:45 how come all of you guys are C programmers and know about FS :( 2008-10-01 16:45 no good place to learn C/FS in Middle east 2008-10-01 16:45 no C books in middle east 2008-10-01 16:45 :( 2008-10-01 16:46 I come from Romania ;-) 2008-10-01 16:47 with so much code around a C book it not so useful at it was before 2008-10-01 17:00 flips 2008-10-01 17:00 i was wondering 2008-10-01 17:00 since ext3cow is just FS you wanted 2008-10-01 17:00 but has missing features 2008-10-01 17:00 why dont you just simply make tux3 based on ext3cow ? 2008-10-01 17:01 ipsn? 2008-10-01 17:02 orgthingy, because by the time I finished changing it, it would not resemble ext3 any more at all, and that would be more work than just starting from the beginning 2008-10-01 17:02 for one thing, tux3 does not use a journal 2008-10-01 17:03 i see 2008-10-01 17:03 and the ext3 file index format cannot be adapted for extents, it has to be thrown away and rewritten 2008-10-01 17:03 but little people over here.. do people know what tux3 is 2008-10-01 17:03 ? 2008-10-01 17:03 the list goes on 2008-10-01 17:03 i knew tux3 from stumble-upon 2008-10-01 17:03 basically very little survives 2008-10-01 17:03 i see 2008-10-01 17:04 at least tux3 uses the ext2 directory code 2008-10-01 17:04 for now 2008-10-01 17:05 -!- inverse(~none@h80-net10.simres.netcampus.ca) has joined #tux3 2008-10-01 17:06 flips : how did you get involved in all this opensource project? 2008-10-01 17:06 opensource world* 2008-10-01 17:06 tux2 2008-10-01 17:07 so, tux2 was very first project you were ever involved in? 2008-10-01 17:07 anyway, ext3cow is not the fs I want, I said it is a good project, not that I want to use it myself 2008-10-01 17:07 same goes for btrfs and ext4 2008-10-01 17:08 oh common, ext4 sounds exciting 2008-10-01 17:08 not to me 2008-10-01 17:08 aha ^_^ 2008-10-01 17:10 sk8 oclock 2008-10-01 17:17 http://ipsn.acm.org/2009/ 2008-10-01 17:17 I love the sk8 oclock thing :D 2008-10-01 17:29 flipsout: nice 2008-10-01 17:34 flipsout: have you heard of dmapi ? 2008-10-01 17:34 http://en.wikipedia.org/wiki/XFS#DMAPI 2008-10-01 17:56 shapor, I have 2008-10-01 17:56 permabanned from linux for some unknown reason 2008-10-01 17:56 hrm 2008-10-01 17:57 some people do use it apparently 2008-10-01 17:57 let me take a closer look 2008-10-01 17:58 wow. i've just written the worst code since my first hello world... static allocation of 10000 slots for the free list for the toy os we are coding at university :) 2008-10-01 17:58 but shared mem is supported :) 2008-10-01 17:59 razvanm, good look with your paper then, it's a paper, right? 2008-10-01 17:59 flips: yup 2008-10-01 17:59 ACTION has yet to write his worst code 2008-10-01 17:59 true, the worse code is always ahead ;-) 2008-10-01 17:59 ok, worst code yet :) 2008-10-01 20:04 -!- ajonat(~ajonat@190.48.123.108) has joined #tux3 2008-10-01 20:11 -!- Kirantpatil(~kiran@122.167.213.15) has joined #tux3 2008-10-01 20:11 -!- Kirantpatil(~kiran@122.167.213.15) has left #tux3 2008-10-01 20:18 Question: does Tux3 support file creation dates? I started playing with it today and noticed that everything was set to "1969-12-31 19:00" 2008-10-01 20:24 inverse: yeah, they're just not used by fuse iirc 2008-10-01 20:24 by the tux3fuse program that is 2008-10-01 20:25 ah, that makes sense. I was confused because I saw an email on the list that referred to them being correct :) 2008-10-01 20:30 hmm, there are 2 different fuse implemenations 2008-10-01 20:30 maybe its in one not the other 2008-10-01 20:30 there is tux3fs and tux3fuse 2008-10-01 20:35 yes, now I have it working :) 2008-10-01 20:35 cool! 2008-10-01 20:35 whatd you chanfe? 2008-10-01 20:35 change* 2008-10-01 20:36 "make makefs" instead of "make testfs" after looking at the source files again I now understand what those are :p 2008-10-01 21:08 http://en.wikipedia.org/wiki/Ext4 <- the one feature I see here we need to add is persistent preallocation 2008-10-01 21:08 should be able to do that with essentially zero code 2008-10-01 21:36 hey flips 2008-10-01 21:49 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-01 21:56 shapor: your link took me down the rabbit hole to this 2008-10-01 21:56 http://en.wikipedia.org/wiki/Comparison_of_file_systems 2008-10-01 21:56 Daniel needs a wiki entry 2008-10-01 22:19 hey tim_dimm 2008-10-01 22:19 wassap? 2008-10-01 22:20 tim_dimm: there already is one for tux3 2008-10-01 22:20 saw that- needs to be one for Daniel too 2008-10-01 22:20 everyone else has one 2008-10-01 22:20 (all the kids are doing it) 2008-10-01 22:21 shapor to the bat channel 2008-10-02 03:07 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-02 04:19 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-10-02 07:08 -!- orgthingy(~orgthingy@62.150.55.188) has joined #tux3 2008-10-02 07:08 hello! 2008-10-02 08:55 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-02 09:35 -!- Kirantpatil(~kiran@122.166.93.80) has joined #tux3 2008-10-02 09:35 -!- Kirantpatil(~kiran@122.166.93.80) has left #tux3 2008-10-02 11:23 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-02 14:05 -!- kd(~kd@121.246.35.242) has joined #tux3 2008-10-02 14:09 -!- orgthingy_(~orgthingy@62.150.55.188) has joined #tux3 2008-10-02 15:47 well, extent read is a little harder than extent write in one respect 2008-10-02 15:47 for the write, we know to form up extents by searching for adjoining dirty regions 2008-10-02 15:47 dirty buffers I mean 2008-10-02 15:48 for read we don't 2008-10-02 15:48 I suppose I could implement readahead here 2008-10-02 15:48 and just go read a whole extent every time somebody asks for a buffer 2008-10-02 15:48 for now 2008-10-02 15:48 the problem is, the buffer at a time high level interface is lame 2008-10-02 15:49 but accurately models what we will get in kernel 2008-10-02 15:49 we need an extent at a time interface that comes from the sys_write level 2008-10-02 15:50 which there is a hook for 2008-10-02 15:50 but it means bypassing the whole generic_read/write mess 2008-10-02 15:50 which might be ok in that it means bypassing a big mess 2008-10-02 15:50 but it also means we will have to maintain essentially a forked version of the read/write library 2008-10-02 15:51 volunteers? 2008-10-02 16:13 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-02 16:15 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: friends of grab_cache_page ~ Postponed till 9 pm tonight, thursday Oct 2" 2008-10-02 16:15 -!- flips changed mode/#tux3 -> -o flips 2008-10-02 16:15 nice 2008-10-02 16:15 flips : you should ask in other channels 2008-10-02 16:15 here all people try to help 2008-10-02 16:16 flips : ask in Linux and opensource channels 2008-10-02 16:16 or, in EFnet 2008-10-02 16:16 ? 2008-10-02 16:16 some may-be interested 2008-10-02 16:16 about what? 2008-10-02 16:16 in helping you 2008-10-02 16:16 oh 2008-10-02 16:16 feel free to ask 2008-10-02 16:16 well, i really dont know much about it? 2008-10-02 16:16 so, i dont really know if i can ask 2008-10-02 16:16 that's ok 2008-10-02 16:16 :P 2008-10-02 16:17 ACTION invites flips to #ubuntu-offtopic @ irc.freenode.net 2008-10-02 16:17 just tell people a filesystem project needs devs, is willing to help them learn 2008-10-02 16:17 sure 2008-10-02 16:18 :) 2008-10-02 16:20 doesn't sound too professional. :| 2008-10-02 16:20 * snuxoll knows nothing about filesystem design flips 2008-10-02 16:20 see? looks matter :P 2008-10-02 16:20 but, still, ask 2008-10-02 16:20 and explain 2008-10-02 16:20 there are lots of people there 2008-10-02 16:24 sure, but C coders? 2008-10-02 16:24 flips : common 2008-10-02 16:24 its linux channel 2008-10-02 16:24 its full of C coders :P 2008-10-02 16:26 you'd be surprised 2008-10-02 16:26 these days linux coding seems to be more about php than anything 2008-10-02 16:27 flips : php? 2008-10-02 16:27 nay 2008-10-02 16:27 Python and C 2008-10-02 16:27 C++ a bit 2008-10-02 16:27 << python 2008-10-02 16:28 orgthingy, we could use more people playing with the fuse stuff 2008-10-02 16:28 and just trying it and complaining about broken things 2008-10-02 16:28 that's one way to get shapor to code ;) 2008-10-02 16:28 flips : FreeNode is like a UNIX and Linux network 2008-10-02 16:28 lots of programmers there 2008-10-02 16:28 he's awesome when he does 2008-10-02 16:28 you *should* find someone there 2008-10-02 16:28 heh :) 2008-10-02 16:29 flips : and sourceforge would be good idea, so is stumble-upon and digg 2008-10-02 16:30 flips : if it stays small, it'd be "just another free software project" but if you "market" it (not business term) it'd be "just another great opensource project" 2008-10-02 16:30 ok, there's my troll 2008-10-02 16:30 anything else has to come from the grassroots 2008-10-02 16:30 that means you, orgthingy 2008-10-02 16:31 "troll" ? 2008-10-02 16:31 ACTION didnt understand what flips meant  2008-10-02 16:32 wikepedia that 2008-10-02 16:33 you mean troll as is in "annoying useless dude in IRC" ? 2008-10-02 16:33 orgthingy, my time is better spent making code happen, it's up to people who want to help to go spread the word 2008-10-02 16:33 flips : well, i think time is worth looking for people to *code* with you 2008-10-02 16:33 troll as in "saying something controversial in order to get a response" 2008-10-02 16:33 get what i mean? 2008-10-02 16:34 oh, common :( 2008-10-02 16:34 orgthingy, my time is also better spent encouraging people to go out and find coders than going out and hunting myself 2008-10-02 16:34 ok 2008-10-02 16:34 ACTION hides 2008-10-02 16:35 time being at a premium here 2008-10-02 16:35 sorry :| 2008-10-02 16:35 got to get extent reading working today according to me 2008-10-02 16:35 see the resonding lack of response on ubuntu channel 2008-10-02 16:35 prevailing attitude seems to be "work is somebody else's problem, we're here to hang and feel leet" 2008-10-02 16:36 maybe that's not accurate 2008-10-02 16:36 flips : maybe because it's offtopic channel 2008-10-02 16:36 and maybe because ubuntu users dont program 2008-10-02 16:36 I think the latter is the reason 2008-10-02 16:36 one of the reasons 2008-10-02 16:37 willingness to contribute could certainly be better 2008-10-02 16:37 being willing to always lose the early adopter race to gentoo is not a healthy attitude 2008-10-02 16:37 ok, im asking while you're coding 2008-10-02 16:37 :D 2008-10-02 16:38 seems fair 2008-10-02 16:38 best strategy is just for somebody like you to say there, "there's cool stuff going down on oftc #tux3, why not drop by for a visit" 2008-10-02 16:39 flips : well, thats called trolling in freenode :P 2008-10-02 16:40 or, spamming 2008-10-02 16:40 Id rather ask if someone is interested 2008-10-02 16:40 if they are, ill tell them to come by 2008-10-02 16:40 drop* 2008-10-02 16:40 orgthing, not in #offtopic 2008-10-02 16:40 ok 2008-10-02 16:40 im asking in many channels like #C 2008-10-02 16:40 if it worries you, then give a url instead of a channel 2008-10-02 16:40 #c would be good 2008-10-02 16:41 any c coder can become a kernel coder, or if they don't like kernel, fuse is entirely userspace 2008-10-02 16:41 flips : sorry, but i said "we" but i think suing "we" is better than "he" :P 2008-10-02 16:41 we is correct 2008-10-02 16:42 "we" as in "everybody who thinks fat baby penquins are cute" 2008-10-02 16:42 haha 2008-10-02 16:44 flips : i think all programmers are asleep now 2008-10-02 16:44 usually, all of them go like "i want to join!" 2008-10-02 16:44 meh, maybe they're asleep :P 2008-10-02 16:44 programmers are usually asleep 2008-10-02 16:44 just as shapor 2008-10-02 16:45 just ask shapor 2008-10-02 16:45 some are drunk :P (yes really xD) 2008-10-02 16:45 or drunk, right, when they're awake 2008-10-02 16:45 that's why tux3 project has a requirement for beer to be sent 2008-10-02 16:46 to keep our programmers "in the zone" 2008-10-02 16:47 hmm, some say they're already programming in other projects :P 2008-10-02 16:47 haha 2008-10-02 16:47 haha 2008-10-02 16:48 don't take that for an answer ;) 2008-10-02 16:48 flips : ill ask Eloxoph people (i was staff there once) 2008-10-02 16:48 got to have a better excuse than that for being lame 2008-10-02 16:48 its full of C programmers 2008-10-02 16:48 but they're already working on 2 projects 2008-10-02 16:48 sounds good 2008-10-02 16:48 but ill ask them anyway :P 2008-10-02 16:48 2 is not enough 2008-10-02 16:48 should be 3 2008-10-02 16:54 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-02 16:55 I'll be hanging on #freenode if any ubuntus want to chat, flipz, not on any channel 2008-10-02 16:55 heh 2008-10-02 16:55 FelipeS : hello 2008-10-02 16:55 orgthingy, hey 2008-10-02 16:55 so, finally got interested? 2008-10-02 16:56 friend of yours orgthingy? 2008-10-02 16:56 flips : from ##not-physics over there 2008-10-02 16:56 any dev with physics background tends to be interesting to me 2008-10-02 16:56 likewise music and math 2008-10-02 16:57 seem to be generally more aware design wise 2008-10-02 16:57 I'm just a young student. You got me there with the music background. I'm not into music at all. 2008-10-02 16:57 ok, let's see if the read extent generator works 2008-10-02 17:02 felipes, physics? 2008-10-02 17:02 or does #not-physics really mean not physics? 2008-10-02 17:02 flips : no, not-physics is offtopic channel of ##physics that i founded 2008-10-02 17:03 right, so it means "physics student" essentially? 2008-10-02 17:03 folks 2008-10-02 17:03 :P 2008-10-02 17:03 ok, physics prof then 2008-10-02 17:04 ACTION has to find his hotel receipts and stuff for reembursement 2008-10-02 17:04 mad scientist? 2008-10-02 17:04 flips : uuuumm? 2008-10-02 17:05 what is #physics about? 2008-10-02 17:05 flips : PHYSICS :P ? 2008-10-02 17:05 flips : how about going on topic with felipes? 2008-10-02 17:06 felipes, got c skillz? 2008-10-02 17:08 eh 2008-10-02 17:08 Honestly I'm in no position for being a dev or working on serious stuff 2008-10-02 17:08 at least I don't think so 2008-10-02 17:09 I just came here to maybe see some conversations on real work being done. 2008-10-02 17:09 sure, good place for that 2008-10-02 17:09 flips : he knows C++ most 2008-10-02 17:09 are you like Linus T. ? not allowing c++ code at all :P 2008-10-02 17:09 c++ has a certain influence over what we do 2008-10-02 17:10 I have nothing against c++, I would be ok with allow some files in kernel to be compiled that way, with appropriate care 2008-10-02 17:10 I'm a first year computer eng major at tech, programming is a hobby that got ahold of me about 4 years ago. I've never done anything impressing with it however. just learning here and there. 2008-10-02 17:10 but linus will not allow it so that ends that 2008-10-02 17:10 and it's all been with c++ 2008-10-02 17:11 computer eng is a good place to get a perspective on software efficiency 2008-10-02 17:11 c++ includes c, most of it 2008-10-02 17:11 yeah I know 2008-10-02 17:11 c++ lacks designated initializers, which we use extensivley, without them I'd be unwilling to do a kernel project in c++ even if it was ok with linus 2008-10-02 17:11 is it not 100% backwards compatible? 2008-10-02 17:11 'backwards' :P 2008-10-02 17:11 not 100% 2008-10-02 17:11 stupidly so 2008-10-02 17:15 "sure, good place for that"; was that sarcasm? flips 2008-10-02 17:16 not at all 2008-10-02 17:16 sarcasm always comes with a :p 2008-10-02 17:16 some of the best devs in the known universe hang here 2008-10-02 17:16 just have to catch them talking ;) 2008-10-02 17:18 oh well that's great. I'll be sure to add it to my favs then. 2008-10-02 17:18 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 17:18 ---- extent 0x5/1 ---- 2008-10-02 17:18 prior extents: 2008-10-02 17:18 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 17:18 filemap_extent_read: index 5, limit 6 2008-10-02 17:19 filemap_extent_read: offset = 0, gap = 0 2008-10-02 17:19 filemap_extent_read: fill gap at 5/1 2008-10-02 17:19 balloc extent -> [2/1] 2008-10-02 17:19 segs (offset = 0): 5 => 2/1; (1) 2008-10-02 17:19 well things are starting to happen with extent read 2008-10-02 17:20 course it should not be allocating blocks on read 2008-10-02 17:20 that's because it started as a cut n paste of extent write 2008-10-02 17:20 needs to fill those buffers with zero instead 2008-10-02 17:20 hmm 2008-10-02 17:21 no, just the one buffer 2008-10-02 17:21 ...maybe 2008-10-02 17:21 I doubt "fill ahead" is a win 2008-10-02 17:29 getting closer... 2008-10-02 17:29 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 17:29 ---- extent 0x5/1 ---- 2008-10-02 17:29 prior extents: 2008-10-02 17:29 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 17:29 filemap_extent_read: index 5, limit 6 2008-10-02 17:29 filemap_extent_read: offset = 0, next = 6, gap = 1 2008-10-02 17:29 filemap_extent_read: fill gap at 5/1 2008-10-02 17:29 balloc extent -> [2/1] 2008-10-02 17:29 segs (offset = 0): 5 => 2/1; (1) 2008-10-02 17:29 filemap_extent_read: extent 0x5/1 => 2 2008-10-02 17:29 filemap_extent_read: read block 0x5 => 2 2008-10-02 17:29 now need to get rid of that balloc/read and replace with fill for unmapped buffer 2008-10-02 18:08 seg[segs++] = *(struct extent *)(u64[]){ -1LL }; <- some nasty c 2008-10-02 18:08 thought I'd show that before throwing it away 2008-10-02 18:10 now does the right thing for unmapped buffers: 2008-10-02 18:10 filemap_extent_read: logical block 0x5 of inode 0x0 2008-10-02 18:10 ---- extent 0x5/1 ---- 2008-10-02 18:10 prior extents: 2008-10-02 18:10 ---- rewind to 0x0 => 0/1 ---- 2008-10-02 18:10 filemap_extent_read: index 5, limit 6 2008-10-02 18:10 filemap_extent_read: offset = 0, next = 6, gap = 1 2008-10-02 18:10 filemap_extent_read: fill gap at 5/1 2008-10-02 18:10 segs (offset = 0): 5 => ffffffffffff/1; (1) 2008-10-02 18:10 filemap_extent_read: extent 0x5/1 => ffffffffffff 2008-10-02 18:10 filemap_extent_read: read block 0x5 => ffffffffffff 2008-10-02 18:10 filemap_extent_read: zero fill buffer 2008-10-02 18:10 which brings us to sk8 oclock 2008-10-02 19:53 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 19:55 oh... no class today? 2008-10-02 20:01 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 20:02 aaa... 9pm tonight :P 2008-10-02 20:17 really? 9? 2008-10-02 20:20 I have to be up in about 6h... :( 2008-10-02 20:28 I'm falling asleep already... 2008-10-02 20:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-02 20:28 me too 2008-10-02 20:28 http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L124 2008-10-02 20:29 about the first Q from the homework... 2008-10-02 20:29 what was the second one? 2008-10-02 20:31 aaa... the locks when a file is closed 2008-10-02 20:34 hmm 2008-10-02 20:36 I've already answered both. 2008-10-02 20:36 http://lxr.linux.no/linux+v2.6.26.5/fs/open.c#L1175 files->file_lock cannot be the lock we were talking about... 2008-10-02 20:36 let me check my logs 2008-10-02 20:37 ok, so let's pick it up again next thursday 2008-10-02 20:37 ok, first home work was why both ptr and struct address_space (*i_mapping and i_data) in struct inode 2008-10-02 20:37 http://lxr.linux.no/linux+v2.6.26.5/fs/locks.c#L1567 so the locks are tided to filp 2008-10-02 20:37 I will be offline for a few days 2008-10-02 20:37 ah 2008-10-02 20:37 it goes on with out me ;) 2008-10-02 20:37 (the ideal situation) 2008-10-02 20:38 :-) 2008-10-02 20:38 and the above L124 is not quite the answer 2008-10-02 20:38 ralucam, you following ok? 2008-10-02 20:38 MaZe: true, it's the point where are made the same... 2008-10-02 20:38 that's were it gets set to the default value, but why do you need a pointer, couldn't u always use &i_data 2008-10-02 20:38 the i_data doesn't seem to be used much 2008-10-02 20:39 inode->mapping is defined to be 2008-10-02 20:39 always valid 2008-10-02 20:39 re 1175 - not sure what you mean 2008-10-02 20:39 sorry... I was wrong about L124 2008-10-02 20:40 the locks in L1175 and L1567 are not quite the ones we were talking about re fget_light/fput_light 2008-10-02 20:40 so the thing from L124 is put in the structure here: http://lxr.linux.no/linux+v2.6.26.5/fs/inode.c#L184 2008-10-02 20:40 MaZe: aaaa... 2008-10-02 20:40 right 2008-10-02 20:40 but that's the default path 2008-10-02 20:41 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-02 20:41 now I'm searching for the places where i_mapping is changed :P 2008-10-02 20:41 the one which could have sed / "->i_mapping->" "->i_data." / 2008-10-02 20:41 killing lxr in the process ;-) 2008-10-02 20:42 maze, not quite 2008-10-02 20:42 hmm? 2008-10-02 20:42 i_mapping is a pointer, i_data is an object 2008-10-02 20:42 hence -> to . 2008-10-02 20:42 I'm searching for i_mapping only ;-) 2008-10-02 20:42 right 2008-10-02 20:43 I don't expect the name to be used for anything else 2008-10-02 20:43 sorry, my eyes got crossed with the sed ;) 2008-10-02 20:43 ok, searching logs for the second homework question 2008-10-02 20:43 there was a bonus about why fget_light is demented 2008-10-02 20:45 a: breaks refcounting purely to reduce cacheline pinging 2008-10-02 20:45 ah, ok, so that wasn't the bonus, that was just the second homework 2008-10-02 20:45 accurately characterized by akpm as "foul" 2008-10-02 20:45 http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L290 2008-10-02 20:45 looks closer 2008-10-02 20:46 but no cigar 2008-10-02 20:46 that's the default value as well 2008-10-02 20:46 I still am not sure why we need i_mapping -> -i_data 2008-10-02 20:46 i_mapping = &i_data 2008-10-02 20:46 and http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L425 2008-10-02 20:46 http://lxr.linux.no/linux+v2.6.26.5/fs/block_dev.c#L450 2008-10-02 20:46 to be more precise 2008-10-02 20:47 I suspect it's bogus, but a pretty extensive survey of usage would be required to say one way or the other 2008-10-02 20:47 that is the _main_ use case in the kernel 2008-10-02 20:47 (there are two others) 2008-10-02 20:47 right, coda and ? 2008-10-02 20:47 raw char devs 2008-10-02 20:47 char devs? 2008-10-02 20:48 why only char devs? 2008-10-02 20:48 the primary use case seems to be block devices, the secondary is raw char devices (ie. for direct io to block devs, ancient pre-O_DIRECT interface), and third/last is the coda fs 2008-10-02 20:48 well the blockdev usage smells really bogus 2008-10-02 20:48 -!- ajonat(~ajonat@190.48.123.108) has joined #tux3 2008-10-02 20:48 inode->i_mapping = &inode->i_data; 2008-10-02 20:48 from coda ;-) 2008-10-02 20:48 wrong line 2008-10-02 20:49 since that's restoring the default 2008-10-02 20:49 I pick it because of that ;-) 2008-10-02 20:49 the rest are in file.c 2008-10-02 20:50 jeez, that blockdev code is twisted 2008-10-02 20:50 I think, product of fuzzy thinking and not necessity 2008-10-02 20:50 but more analysis is needed to be sure 2008-10-02 20:51 there should be some system to mark as 'looks wrong to me' some stuff in the kernel :P 2008-10-02 20:51 heh 2008-10-02 20:51 we carve our comments in the internet, right hew 2008-10-02 20:51 right here 2008-10-02 20:51 ./fs/coda/file.c-106- host_inode = host_file->f_path.dentry->d_inode; 2008-10-02 20:51 ./fs/coda/file.c-107- coda_file->f_mapping = host_file->f_mapping; 2008-10-02 20:51 ./fs/coda/file.c:108: if (coda_inode->i_mapping == &coda_inode->i_data) 2008-10-02 20:51 ./fs/coda/file.c:109: coda_inode->i_mapping = host_inode->i_mapping; 2008-10-02 20:51 ./fs/coda/file.c-110- 2008-10-02 20:51 ./fs/coda/file.c-111- /* only allow additional mmaps as long as userspace isn't changing 2008-10-02 20:52 where they will be unearthed by archaeologists millenia later 2008-10-02 20:52 that's the relevant part of coda 2008-10-02 20:52 ./drivers/char/raw.c-76- filp->f_mapping = bdev->bd_inode->i_mapping; 2008-10-02 20:52 ./drivers/char/raw.c-77- if (++raw_devices[minor].inuse == 1) 2008-10-02 20:52 ./drivers/char/raw.c:78: filp->f_path.dentry->d_inode->i_mapping = 2008-10-02 20:52 ./drivers/char/raw.c-79- bdev->bd_inode->i_mapping; 2008-10-02 20:52 ./drivers/char/raw.c-80- filp->private_data = bdev; 2008-10-02 20:52 and that's for raw char dev 2008-10-02 20:53 bdev again 2008-10-02 20:53 while I can see/understand the need for the raw char dev and coda use cases 2008-10-02 20:53 I don't yet get what the normal bdev case is for 2008-10-02 20:53 actually... they looks similar 2008-10-02 20:53 coda and raw 2008-10-02 20:53 not surprising 2008-10-02 20:54 coda is a networked file system with local caching and offline operation 2008-10-02 20:54 the raw.c stuff looks bogus too 2008-10-02 20:54 at least party 2008-10-02 20:54 why dow d_inode need to have a mapping? 2008-10-02 20:54 it needs a way to tell the kernel that the page cache for a file in codafs is actually the page cache for another file in a local filesystem (ie. in the cache fs store) 2008-10-02 20:54 sorrry 2008-10-02 20:54 while the raw char dev case needs to remap the raw char dev to the block dev 2008-10-02 20:55 bd_acquire is only called in 3 places... 2008-10-02 20:55 lying underneath it 2008-10-02 20:55 whey does filp need a mapping, I meant to say 2008-10-02 20:55 maze, but that raw char dev case sounds like it could be done some other way 2008-10-02 20:56 hmm... how does cache-ing for char devices works? :P 2008-10-02 20:56 a maybe 2008-10-02 20:56 it's if we open the same block dev from different inodes? 2008-10-02 20:56 from different entries and/or filesystems? 2008-10-02 20:56 something wierd 2008-10-02 20:56 that offends my sense of form and balance 2008-10-02 20:56 since they're all actually the same block device, but they're not the same inode 2008-10-02 20:56 that makes sense 2008-10-02 20:56 want to have them backed by the same page cache 2008-10-02 20:57 make sense! :D 2008-10-02 20:57 how about coda? 2008-10-02 20:57 we don't open block devices "from inodes" 2008-10-02 20:57 we open them from names 2008-10-02 20:57 sure we do ;-) 2008-10-02 20:57 we're talking about i_node->i_mapping remember ;-) 2008-10-02 20:57 right name -> dentry -> inode -> bla bla -> mapping 2008-10-02 20:57 $ stat hda1 2008-10-02 20:57 File: `hda1' 2008-10-02 20:57 Size: 0 Blocks: 0 IO Block: 4096 block special file 2008-10-02 20:57 Device: dh/13d Inode: 2343 Links: 1 Device type: 3,1 2008-10-02 20:57 Access: (0660/brw-rw----) Uid: ( 0/ root) Gid: ( 6/ disk) 2008-10-02 20:57 Access: 2008-10-01 15:41:08.151619201 -0400 2008-10-02 20:57 Modify: 2008-10-01 15:40:54.344130563 -0400 2008-10-02 20:58 Change: 2008-10-01 15:40:54.344130563 -0400 2008-10-02 20:59 stat hdaX 2008-10-02 20:59 File: `hdaX' 2008-10-02 20:59 Size: 0 Blocks: 0 IO Block: 4096 block special file 2008-10-02 20:59 Device: dh/13d Inode: 911348 Links: 1 Device type: 3,1 2008-10-02 20:59 Access: (0644/brw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) 2008-10-02 20:59 MaZe is right 2008-10-02 20:59 # ls -al sda 2008-10-02 20:59 brw-rw---- 1 root disk 8, 0 2008-09-29 17:52 sda# mknod sda__ b 8 0 2008-10-02 20:59 # ls -al sda__ 2008-10-02 20:59 brw-r--r-- 1 root root 8, 0 2008-10-02 20:58 sda__ 2008-10-02 20:59 # stat sda 2008-10-02 20:59 Device: eh/14d Inode: 339 Links: 1 Device type: 8,0 2008-10-02 20:59 # stat sda__ 2008-10-02 20:59 Device: eh/14d Inode: 844038 Links: 1 Device type: 8,0 2008-10-02 20:59 :D 2008-10-02 20:59 I was first ;-) 2008-10-02 20:59 yup 2008-10-02 21:00 and it'll have an even different inode on a different partition/filesystem 2008-10-02 21:00 hence, different sb,inode pairs referring to the same block dev, but having to have the same page cache 2008-10-02 21:00 that for sure 2008-10-02 21:00 I don't understand the page cache for char dev though 2008-10-02 21:01 the raw char dev is an abomination 2008-10-02 21:01 it behave like a block dev 2008-10-02 21:01 nowadays the correct way is to open the normal (base) block dev with option O_DIRECT 2008-10-02 21:01 maze, but why does kernel ever have to use the inode from the mknod file directly? 2008-10-02 21:01 oh... 2008-10-02 21:01 why not dereference to the real inode first? 2008-10-02 21:01 flips: dentry? 2008-10-02 21:02 could be a dentry reason, still sounds bogus 2008-10-02 21:02 real inode? 2008-10-02 21:02 even if it's in the dentry 2008-10-02 21:02 it used to be you would map the base dev to a raw dev and then get direct io on the raw dev, which for reasons which escape me wasn't a block dev, but a char dev instead 2008-10-02 21:02 dreference after getting hold of the dentry 2008-10-02 21:02 because bdevs don't have inodes? 2008-10-02 21:02 "if" it's a dentry reason 2008-10-02 21:02 the names can be different 2008-10-02 21:02 so the entries are diferent 2008-10-02 21:03 there is no real name, no real inode, the only thing there is is the block major, minor pair 2008-10-02 21:03 you want to alias them to the same inode? 2008-10-02 21:03 my sense that this extra level of indirection in ->mapping is being abused by blockdevs is getting stronger 2008-10-02 21:03 you could probably get away with having a blockdevfs 2008-10-02 21:03 which would be the fs with inodes for blockdevs, which you could then return instead of the proper inode from the normal filesystem 2008-10-02 21:03 except 2008-10-02 21:04 inodes also store permissions 2008-10-02 21:04 which might be different between different blockdev entries on the filesystem 2008-10-02 21:04 :D 2008-10-02 21:04 well let's put that one aside 2008-10-02 21:04 return to it later 2008-10-02 21:04 and you couldn't delete them... 2008-10-02 21:04 since you'd be deleting the wrong inode 2008-10-02 21:04 after I've had time to read more around that part of the code ;) 2008-10-02 21:05 so the inode references the entry in the filesystem 2008-10-02 21:05 it's often a mistake to assume that something which is really weird is that way because it has to be 2008-10-02 21:05 [hmm although deletions, really delete dentries, not inodes] 2008-10-02 21:05 dirent, not dentry 2008-10-02 21:06 we reserve the latter name to mean the cache item 2008-10-02 21:06 right nomenclature... 2008-10-02 21:06 just convention 2008-10-02 21:06 I have a feeling you'd still need it for stuff like coda, or more advanced caching network fs'es anyway 2008-10-02 21:07 _maybe_ 2008-10-02 21:07 why don't you write a stacking filesystem and see if you're forced to use that feature? 2008-10-02 21:07 ordinary filesystem is too easy for you ;) 2008-10-02 21:08 in truth, I've taken a light run at that myself and found the issues... hurtful 2008-10-02 21:08 our vfs was not designed to be stacked 2008-10-02 21:08 it's very much a fixed number of levels kind of thing 2008-10-02 21:09 every now and then some people show up to improve the stackability, and after pushing uphill for a while they go away again 2008-10-02 21:10 then when fuse came along, most of the motivation for stacking filesystems in kernel went away 2008-10-02 21:10 leaving just nfs... a quasi stacking filesystem... and coda... I think I have very little understanding of 2008-10-02 21:10 for a while there was intermezzo 2008-10-02 21:10 which never quite got working 2008-10-02 21:11 well I don't think it was an official tux3 u session today 2008-10-02 21:11 let's not post logs 2008-10-02 21:12 will resume in earnest next thursday 2008-10-02 21:12 with the friends, right? :D 2008-10-02 21:13 that would be a good place, or any requests? 2008-10-02 21:13 what is the biggest mystery remaining? 2008-10-02 21:13 dentry cache ranks up there 2008-10-02 21:13 path walk 2008-10-02 21:14 ->rename, worth a visit 2008-10-02 21:14 pdflush? 2008-10-02 21:14 it's really vm, but you need to know how it works to write fast filesystem code 2008-10-02 21:15 ACTION will still 'float' till 22 :| 2008-10-02 21:15 hmm, it may even not be generic enough for cool stuff like cow or versioned or other types of opts filesystems 2008-10-02 21:16 what is pdflush? 2008-10-02 21:16 maze, what might not be? 2008-10-02 21:16 pdflush? certainly isn't 2008-10-02 21:16 the current i_mapping pointer 2008-10-02 21:16 ah 2008-10-02 21:16 since you could potentially have 2008-10-02 21:16 yes, you could get into some really demented versioning tricks 2008-10-02 21:16 2 files on different inodes on different filesystems 2008-10-02 21:17 but we will use a very simple one... 2008-10-02 21:17 one version of a file gets loaded into the inode->mapping and that is it 2008-10-02 21:17 and the page cache could potentially be shared between them if they (or parts of them) refer to the same data 2008-10-02 21:17 and shared with the block dev cache that the files are stored on , etc 2008-10-02 21:17 we don't try to share mapping pages between different versions of the same file 2008-10-02 21:17 ACTION is off to bed. Good night to everyone! 2008-10-02 21:17 sharing mapping pages would require deep surgery 2008-10-02 21:17 see you 2008-10-02 21:17 yep, just what I realized 2008-10-02 21:17 good night! 2008-10-02 21:18 save that for linux 2.9 2008-10-02 21:18 but sharing pages between the blockdev and the fs on top of it, and the netfs exported/imported from it, and the various versions and cow files is what should happen 2008-10-02 21:19 why between various versions of cow files? 2008-10-02 21:19 if it lives in one spot on disk, it should only live in one spot in memory 2008-10-02 21:19 why does that matter? 2008-10-02 21:19 for the regions which are identical 2008-10-02 21:19 memory 2008-10-02 21:19 so we waste some cache by duplicating pages, what's the problem? 2008-10-02 21:19 you can get by with a much smaller cache, or make much better use of existing cache 2008-10-02 21:20 we already suck beyond belief for in-cache diff 2008-10-02 21:20 hmm? what do you mean/ 2008-10-02 21:20 maybe fix the obvious breakage first 2008-10-02 21:20 ? 2008-10-02 21:20 try diffing two kernel trees 2008-10-02 21:20 and see how much memory you need to keep both 100% in cache 2008-10-02 21:20 it's ballooned from before, not because the tree got bigger 2008-10-02 21:21 [and yes doing all this is hard because of writes and read-only stuff, and when to dupe, when to modify in place, etc] 2008-10-02 21:21 $ time diff -qr linux-2.6.26.5_ linux-2.6.26.5 2008-10-02 21:21 real 0m1.208s 2008-10-02 21:22 hmm 2008-10-02 21:22 but how much cache am I using 2008-10-02 21:22 on, now suppose you want to share cache pages, that means at find_cache_page miss time you need to be able to know the target page is already in some other cache 2008-10-02 21:22 the way to do that is by putting a forwarding pointer in the page cache for the device 2008-10-02 21:22 Cached: 3193700 kB 2008-10-02 21:22 hmm wonder how much of that was the 2 kernel trees? 2008-10-02 21:23 mazem, 4 GB machine? 2008-10-02 21:23 yup 2008-10-02 21:23 try it with 512 MB 2008-10-02 21:23 $ du -hs * 2008-10-02 21:23 323M linux-2.6.26.5 2008-10-02 21:23 323M linux-2.6.26.5_ 2008-10-02 21:23 won't work 2008-10-02 21:23 since the kernels take 640M themselves 2008-10-02 21:24 1G then 2008-10-02 21:24 Cached: 2552060 kB 2008-10-02 21:24 after deleting both kernel trees 2008-10-02 21:24 would it have gc'ed all the stuff I deleted? 2008-10-02 21:25 it should not have 2008-10-02 21:25 um 2008-10-02 21:25 no, of course it should have 2008-10-02 21:25 deleted = not cacheable 2008-10-02 21:26 so it seems to have deleted exactly the right amount -> 641MB 2008-10-02 21:27 oh, so the block dev and the file system mounted on top of it have seperate page caches, which aren't shared till they hit disk 2008-10-02 21:27 which is why you should never touch the block dev directly, if there's an fs mounted on it 2008-10-02 21:27 not quite 2008-10-02 21:28 (except potentially for reads) 2008-10-02 21:28 they aren't shared, but the set of blocks in the two is disjoint 2008-10-02 21:28 better be 2008-10-02 21:28 uhm 2008-10-02 21:28 I don't think there's any guarantee for that 2008-10-02 21:28 file cache is data, blockdev cache is metadata 2008-10-02 21:28 the filesystem has to make that gaurantee 2008-10-02 21:28 oh, you mean if the fs is using the blockdev 2008-10-02 21:29 sure 2008-10-02 21:29 the fs always uses the blockdev 2008-10-02 21:29 well 2008-10-02 21:29 if you do it from userspace you can easily screw that up 2008-10-02 21:29 it always uses the buffer cache 2008-10-02 21:30 you will see code in filesystems to invalidate buffer cache pages when metadata is freed 2008-10-02 21:30 ah 2008-10-02 21:30 in case they later get used for normal data 2008-10-02 21:30 in some caches, clean alias pages are left around, but that is an accident waiting to happen 2008-10-02 21:30 in some cases I meant 2008-10-02 21:30 yes 2008-10-02 21:31 classic badness 2008-10-02 21:31 has bitten many times 2008-10-02 21:31 in some cases it's impossible to avoid aliases 2008-10-02 21:31 namely when one block on a page is data and another is metadata 2008-10-02 21:32 the blocks themselves are not aliased but the pages are 2008-10-02 21:32 right 2008-10-02 21:33 right, this entire page cache system is nice and simple, and has good performance, but can't really represent all the edge cases, or more complex scenarios 2008-10-02 21:33 it's pretty simple minded true 2008-10-02 21:34 so how would you go about answering the question: in which page cache(s) is a given physical block already mapped? 2008-10-02 21:34 if you could change everything? 2008-10-02 21:35 I'm not sure yet, but it's pretty clear that (if possible to get good performance with something like this) we would want the minimum amount of duplication possible 2008-10-02 21:36 ie. if it's in one physical location on disk, it should remain in one (or zero) pages in ram regardless of how many levels it crosses 2008-10-02 21:36 all the way down 2008-10-02 21:36 block dev, virtual block dev, file system, network fs, userspace mmap, possibly userspace read hack opts 2008-10-02 21:37 specifying exactly what happens when you trigger a write to a page, would be non-trivial 2008-10-02 21:38 in some cases you simply allocate a new page with duped data that is not mapped to anything else (a modify in ram only scenario) 2008-10-02 21:38 in others you'd need to allocate space on the filesystem and map to a 'sync to this location on disk' page 2008-10-02 21:38 etc 2008-10-02 21:39 my main worry would be that we could potentially be triggering spurious context switches on writes to read only pages 2008-10-02 21:39 spurious - wrong word - more like 'an excessive number of' that would hurt performance 2008-10-02 21:40 you need to be able to throw pages around 2008-10-02 21:41 a read from block dev through virtual dev (lvm) to fs, would somehow result in the same page being held from all 3 places 2008-10-02 21:41 you're going to have a lot of trouble when data and metadata live on the same page 2008-10-02 21:41 and then a full page write would result in an existing page getting mapped in to all 3 places at the same time (reverse process), while partial writes would flag pages as dirty etc 2008-10-02 21:42 yes, but I'm not sure metadata and data deserve to be treated seperately 2008-10-02 21:42 you don't need context switch when you write to a read only page, you can check the page flags explicitly 2008-10-02 21:42 and not take a fault 2008-10-02 21:42 in kernel space sure 2008-10-02 21:42 not so in userspae 2008-10-02 21:42 file data is certainly treated separately from metadata 2008-10-02 21:42 not going to change soon 2008-10-02 21:42 in kernel space I could simply check the counters 2008-10-02 21:43 in memory? inodes dentries, etc, sure 2008-10-02 21:43 but on disk? 2008-10-02 21:43 not so sure 2008-10-02 21:43 tux3 already kind of has less of a distinction than normal 2008-10-02 21:43 between? 2008-10-02 21:44 between a file and it's contents, and metadata (logs, btrees, etc) 2008-10-02 21:44 oh, some metadata is mapped as data 2008-10-02 21:44 I think it should be possible to have all the metadata on disk behave like data, with possible exception of (a few?) superblocks 2008-10-02 21:44 I'm sure we're going to hit some interesting recursions in there at some point 2008-10-02 21:45 sometimes I try to map file index metadata into a page cache and it never seems to work out very well 2008-10-02 21:45 well imagine we have a filesystem 2008-10-02 21:45 it already works 2008-10-02 21:46 now we need to store metadata 2008-10-02 21:46 so we write it out to a logfile in the first filesystem 2008-10-02 21:46 metadata != xattrs 2008-10-02 21:46 and store trees as sparse files in the first filesystem, etc 2008-10-02 21:46 sure 2008-10-02 21:46 that's actually done already 2008-10-02 21:46 now if the first filesystem is the filesystem for which we're storing metadata 2008-10-02 21:46 in lustre 2008-10-02 21:46 we've got a problem... 2008-10-02 21:47 since we get updates on updates 2008-10-02 21:47 however 2008-10-02 21:47 if all we update is up to a specific point 2008-10-02 21:47 and the rest is handled via forward logging 2008-10-02 21:47 ok, so you could make it work, but what is the win? 2008-10-02 21:47 then so long as you can guarantee that generating X KB of updates generates less than X KB of new updates, then it converges 2008-10-02 21:48 I think it should actually turn out to be pretty simple 2008-10-02 21:48 code wise 2008-10-02 21:48 if not conceptually 2008-10-02 21:48 so which piece of tux3 would go into a file next? 2008-10-02 21:48 no idea 2008-10-02 21:48 the inode table is problematic because of variable sized inodes 2008-10-02 21:48 does not map into a page cache nicely 2008-10-02 21:48 I'm still at the phase, where I'm thinking this should be doable 2008-10-02 21:49 and since we rely on logging during mount anyway, it never has to fully converge 2008-10-02 21:49 otherwise, filesystems with fixed sized inodes could put the inode table in page cache and it would be a win 2008-10-02 21:49 the file system is always dirty 2008-10-02 21:50 tux3 only has three kinds of things that are not in files: 1) inode table 2) file indexes 3) update logs 2008-10-02 21:50 ie. it's always: what's on disk reflects last commit point + forward log which contains the changes which were made to the fs to perform the last commit (and any other changes from userspace in the mean time) 2008-10-02 21:50 ok, so why ain't the update log in a file? 2008-10-02 21:50 I can't see winning on any of those three by mapping to a file 2008-10-02 21:50 ah 2008-10-02 21:51 don't know ;) 2008-10-02 21:51 recursion for one thing 2008-10-02 21:51 log the updates to the log file 2008-10-02 21:51 see the forward log should just be a periodically front-truncated normal file 2008-10-02 21:51 and the win is? 2008-10-02 21:51 the win is all we have to support is normal files 2008-10-02 21:52 got to be more of a win than that 2008-10-02 21:52 to make up for the extra problems 2008-10-02 21:52 and except from the initial 'recover during mount' phase it's simple 2008-10-02 21:52 you share code for more stuff, you don't have to (at least theoretically) special case allocation for the forward log, etc 2008-10-02 21:53 since we have not done the log at all yet, if you come up with a convincing win argument, we can do it that way 2008-10-02 21:53 although that might be a bad thing 2008-10-02 21:53 sharing code is a minor plus 2008-10-02 21:53 -!- Kirantpatil(~kiran@122.167.195.107) has joined #tux3 2008-10-02 21:53 ACTION is looking for the big win 2008-10-02 21:53 dinner time 2008-10-02 21:53 ACTION thinks this would be the first file system which would deserve the name 2008-10-02 21:53 when is the next burst of activity on junkfs, or is there nothing interesting left to try? 2008-10-02 21:54 lots of interesting stuff 2008-10-02 21:54 it's also end-of-quarter time 2008-10-02 21:54 that was last week 2008-10-02 21:54 working on-and-off on options 2008-10-02 21:54 one would wish it was done last week ;-) 2008-10-02 21:54 we're scoring on monday, so I want to finish two more things I've left before then 2008-10-02 21:58 -!- Kirantpatil(~kiran@122.167.195.107) has left #tux3 2008-10-02 22:21 -!- amey(~amey@116.73.35.180) has joined #tux3 2008-10-02 22:43 ok, time to finish up the extent drop 2008-10-02 22:43 make it the default 2008-10-02 22:43 start exposing bugs 2008-10-02 23:08 flips: how was the sk8 today 2008-10-02 23:09 was a fine skate in the dark 2008-10-02 23:09 started at sunset 2008-10-02 23:09 just down to the sk8 park and back? 2008-10-02 23:09 up to 3rd st 2008-10-02 23:09 ah 2008-10-02 23:10 musicians on the strand were doing special things 2008-10-02 23:10 "funk you we're playing what we want" 2008-10-02 23:10 i got out at 3pm on the road bike 2008-10-02 23:10 rode up tuna canyon for the first time in months 2008-10-02 23:11 tuna is like entering a different zone all together 2008-10-02 23:11 you get 100 yards off the pch, and you're in the wilderness 2008-10-02 23:11 sounds nice 2008-10-03 00:14 extents just landed 2008-10-03 00:15 probably with a big crash, tinkle sound 2008-10-03 00:15 bug hunters should have fun 2008-10-03 00:36 -!- amey(~amey@116.73.35.180) has left #tux3 2008-10-03 00:52 -!- Kirantpatil(~kiran@122.167.195.107) has joined #tux3 2008-10-03 00:53 -!- Kirantpatil(~kiran@122.167.195.107) has left #tux3 2008-10-03 01:19 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-03 01:31 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-03 01:33 -!- Kirantpatil(~kiran@122.167.195.107) has joined #tux3 2008-10-03 01:33 -!- Kirantpatil(~kiran@122.167.195.107) has left #tux3 2008-10-03 02:20 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-03 03:18 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-03 03:51 flips: good work, was just reading the code 2008-10-03 04:20 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-03 04:42 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-03 04:44 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-03 05:23 -!- FelipeS(~Felipe@lawn-128-61-118-191.lawn.gatech.edu) has joined #tux3 2008-10-03 06:15 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-03 07:08 morning tim_dimm 2008-10-03 10:05 flips : hello 2008-10-03 10:05 I just bought a new laptop (with no OS) 2008-10-03 10:05 :D 2008-10-03 10:05 Toshiba Sat. L300 2008-10-03 10:08 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-03 10:14 morning flips 2008-10-03 10:52 I just bought a PPC G4 :P 2008-10-03 10:52 specs? 2008-10-03 11:44 it will be quite small, 733 MHz, no cache L2 2008-10-03 11:44 it will come with 512MB of RAM but I have 1GB from my old eMac 2008-10-03 11:45 it's a quicksilver: http://support.apple.com/specs/powermac/Power_Mac_G4_Quicksilver.html 2008-10-03 11:45 at 105$ including shipping I simply could not resist :P 2008-10-03 12:18 I've got a 1.5Ghz powerbook g4 I've been thinking about putting into linux duty 2008-10-03 12:18 let me know what you load onto it 2008-10-03 12:18 be curious to see what works 2008-10-03 12:19 I've heard there's limited drivers for things like trackpads on ppc 2008-10-03 12:21 I'll put mac os on it 2008-10-03 12:21 I run linux for about 2 years on my iBook G3 2008-10-03 12:22 interesting experience 2008-10-03 12:44 flips and shapor give me major crap about me running mac os 2008-10-03 12:44 flips calls it "shiny" 2008-10-03 12:45 its the only shiny os that I can run FCP on though 2008-10-03 12:50 what is FCP? Flexible Control Protocol? :P http://cs.jhu.edu/~razvanm/ipsn2008koala.pdf 2008-10-03 13:48 final cut pro 2008-10-03 13:48 video editing app 2008-10-03 13:48 my former life as a creative dude 2008-10-03 14:27 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-03 14:52 your FCP is better ;-) 2008-10-03 15:26 119 list members 2008-10-03 15:47 my new fit-pc slim should arrive any day 2008-10-03 15:47 ACTION was going to go up today but is lagging behind work and home related chores 2008-10-03 15:47 http://www.fit-pc.com/new/ 2008-10-03 15:47 bh, so what did you like about the new code? 2008-10-03 15:49 it's simplifying te system 2008-10-03 15:49 indeed 2008-10-03 15:50 stuff like that is better done earlier than later 2008-10-03 15:50 remove 150 lines or so 2008-10-03 15:50 because of entropy related things if you buy into that model of looking at software development 2008-10-03 15:50 more still will be removed, so actually the LoC cost of extents gets close to zero 2008-10-03 15:50 complexity went up though 2008-10-03 15:50 yeah, the complexity of the file system is increasing but the code base is getting more and more dense, sign of maturation 2008-10-03 15:51 you're like the last hope I have for Linux file systems so I'm definitely interested in positive progress of this project 2008-10-03 15:55 hg diff -r921a58bdbf8b | diffstat 2008-10-03 15:55 b/user/test/filemap.c | 330 ++++++++++++++++++++++++++++++ 2008-10-03 15:55 ... 2008-10-03 15:55 15 files changed, 906 insertions(+), 423 deletions(-) 2008-10-03 15:55 <- extents cost 517 lines so far 2008-10-03 15:55 how's it working so far ? 2008-10-03 15:55 ok as far as I've tested it which is not far 2008-10-03 15:56 waiting for you to download it and try it ;-) 2008-10-03 15:56 ACTION might be able to leave tonight or something 2008-10-03 15:56 yeah, looking for testers 2008-10-03 15:56 make filemap && ./filemap foodev 2008-10-03 15:56 you broadcast that to the world yet ? 2008-10-03 15:56 then it outputs lots of stuff 2008-10-03 15:56 if there are no exclamation marks, you're ok 2008-10-03 15:56 seen the post on tux3 ml? 2008-10-03 15:56 and the one on lkml? 2008-10-03 15:57 when ? a couple of days ago ? 2008-10-03 15:57 or yesterday ? 2008-10-03 15:57 today ? 2008-10-03 15:57 rising steadily up the "hot messages" list 2008-10-03 15:57 ok 2008-10-03 15:57 http://kerneltrap.org/ 2008-10-03 15:58 I don't have a subscription to that unfortunately 2008-10-03 15:58 you don't need it 2008-10-03 15:58 just surf in 2008-10-03 15:58 http://groups.google.com/group/linux.kernel/browse_frm/thread/ce1094c9b82a6768/3f40ebbd1d197f60?lnk=gst&q=daniel+phillips#3f40ebbd1d197f60 2008-10-03 15:59 try and get some ibm folks to support it 2008-10-03 15:59 like those ibm folks that got the et4 extents working 2008-10-03 15:59 funny you should mention that 2008-10-03 16:00 how about some of those novell folks? 2008-10-03 16:00 let me see, who is clueful re filesystems at suse 2008-10-03 16:00 andi for sure 2008-10-03 16:00 clueful about everything nearly 2008-10-03 16:01 ok, time to move the devel env to my laptop 2008-10-03 16:01 so I'm not completely useless for the next 5 days 2008-10-03 16:01 that is, the eee 2008-10-03 16:01 can't say how cool it is to be able to develop tux3 under uml and fuse on the eee 2008-10-03 16:02 flips: we're all working on kvm, scheduling, lockdep and stuff like that with one person overextended that has an interest in the oracle file system 2008-10-03 16:02 that's just in our group, most folks are spread pretty thin as is, same for me as well 2008-10-03 16:03 flips: for what are you going to use the fitpc? 2008-10-03 16:03 ACTION continues reading the post 2008-10-03 16:11 flips: the best way of getting resources is to get your project up to a certain stage of functionality where it has popular following 2008-10-03 16:11 then it's much easier to convince folks to commit resources to something like tux3 development, that's just market reality unfortunately 2008-10-03 16:11 -!- kbingham(~kbingham@92.9.147.219) has joined #tux3 2008-10-03 16:12 If I was unestablished and unemployeed, I'd be on it or -rt 2008-10-03 16:17 rzm|away, web server 2008-10-03 16:18 I was going to use my original fit pc for that but it got commandeered by my daughter 2008-10-03 16:18 bh, it already got there when the fuse port went in 2008-10-03 16:19 don't want to hear reasons for not contributing 2008-10-03 16:20 especially from novell 2008-10-03 16:20 who has about 1 bazillion times more resources than me 2008-10-03 16:21 pull somebody off mono ;) 2008-10-03 16:26 http://kerneltrap.org/ <- number one popular lkml message on kernel trap 2008-10-03 16:53 flips: I disagree 2008-10-03 16:54 with? 2008-10-03 16:54 fuse definitely helps but it needs at least a kernel port an a significant following outside of folks just on this channel 2008-10-03 16:54 devs who wait until the code is already in the hands of users are not the devs I'm interested in 2008-10-03 16:55 novell doesn't have infinite amount of resources, quite the opposite 2008-10-03 16:55 of course I know that 2008-10-03 16:55 but compared to me 2008-10-03 16:55 then your best bet is doing what you're currently doing, training the staff to be able to do this kind of work yourself 2008-10-03 16:55 protestations about lack of resources get close to whining ;) 2008-10-03 16:56 dude, it's reality 2008-10-03 16:56 getting on my case, and I'm on your side, doesn't help the situation 2008-10-03 16:56 whining is reality, yes, especially from $billion corps 2008-10-03 16:56 like google 2008-10-03 16:56 masters of whining 2008-10-03 16:56 oh yes 2008-10-03 16:56 like google too, google whines better than anybody 2008-10-03 16:57 corporate slackasses ;) 2008-10-03 16:57 good thing there are actual devs with a clue to hand them their business model 2008-10-03 16:57 ACTION can't find receipts for reembursement :\ 2008-10-03 16:58 anyway, until novell contributes a patch, novell is in the whining/slackass category, simple fact of life 2008-10-03 16:58 I don't speak for all of novell 2008-10-03 16:58 ACTION worres he's beginning to sound a bit like gregkh 2008-10-03 16:58 you're speaking to me 2008-10-03 16:58 I'm on your side 2008-10-03 16:58 I was speaking about novell 2008-10-03 16:59 so light a fire under those pansy asses 2008-10-03 16:59 whoops, is this all logged publicly 2008-10-03 16:59 well, if you're pressuring me it's going to get no where 2008-10-03 16:59 flips: :) 2008-10-03 16:59 doesn't really matter 2008-10-03 17:00 andrea ought to have some fun with this 2008-10-03 17:00 I'll ping him 2008-10-03 17:02 getting close to sk8 oclock 2008-10-03 17:03 bh, just define it as rt work and submit a patch for rt io sheduling 2008-10-03 17:03 can even be done in fuse 2008-10-03 17:03 yeah, right 2008-10-03 17:03 rt doesn't care about performance, only deadlines 2008-10-03 17:03 good way of getting me fired 2008-10-03 17:03 really? 2008-10-03 17:04 yeah, dong something you're not suppose to be dong 2008-10-03 17:04 doing 2008-10-03 17:04 I thought you were doing rt 2008-10-03 17:04 that and more 2008-10-03 17:04 but only for our -rt product 2008-10-03 17:04 that's lame 2008-10-03 17:04 flips: at this time 2008-10-03 17:04 that's life baby 2008-10-03 17:04 just include it in the product 2008-10-03 17:04 tell the pm 2008-10-03 17:04 and I can't just like quit Novell just to work on tux3, that's kind of silly 2008-10-03 17:04 "rt with tux3 rt schedule will rool the woorld" 2008-10-03 17:05 flips: the best I can do is when the right opportunity comes up, I'll spread the good word, but that's going to have limited influence if I'm considered a flake in the group/company 2008-10-03 17:05 rt filesystem is a key blocker for rt 2008-10-03 17:06 without it you've got no rt 2008-10-03 17:06 not for our purposes 2008-10-03 17:06 give it time 2008-10-03 17:06 however much one would like to pretend otherwise 2008-10-03 17:06 and keep working on stuff, it's looking good 2008-10-03 17:06 and you're getting folks on board, that's all good forward progress 2008-10-03 17:09 now if only we'd some some participation from novell 2008-10-03 17:10 course the satisfying point will be the day an oracle dev submits a patch 2008-10-03 17:10 probable a couple months away from that 2008-10-03 17:10 or maybe we have to wait for the cows to come home first, depends 2008-10-03 17:11 on enlightenment of management there mainly 2008-10-03 17:11 then when sun submits a patch we know we RLY ROOL 2008-10-03 17:12 when we expand the addressing to 128 bits we can rename as ZTS (Zigabyte Tux3 fileSystem) 2008-10-03 17:13 -!- ajonat(~ajonat@190.48.123.108) has joined #tux3 2008-10-03 18:26 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-03 18:26 -!- ajonat(~ajonat@190.48.123.108) has joined #tux3 2008-10-03 19:36 -!- ajonat(~ajonat@190.48.116.103) has joined #tux3 2008-10-03 19:43 -!- ajonat(~ajonat@190.48.116.103) has joined #tux3 2008-10-03 20:56 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-03 20:59 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-03 23:11 -!- Aks(~ankitsriv@123.237.70.127) has joined #tux3 2008-10-03 23:20 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-03 23:30 -!- wweng(~chatzilla@p67-47.acedsl.com) has joined #tux3 2008-10-03 23:57 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 00:38 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-04 00:39 ACTION is off to bed after almost 21h of uptime... 2008-10-04 00:40 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-04 00:41 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session Thursday Oct 9: friends of grab_cache_page ~ No Tux3 U on Tuesday 7th ~ flips out" 2008-10-04 00:42 -!- flips changed mode/#tux3 -> -o flips 2008-10-04 00:52 cool, the eee is all set up as a tux3 dev station 2008-10-04 00:52 mercurial rocks, the whole toolchain rocks 2008-10-04 01:41 yes, it does 2008-10-04 01:41 I really like the web interface for it and stuff 2008-10-04 01:42 it's quite nice 2008-10-04 03:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 03:18 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 04:04 -!- kbingham(~kbingham@92.9.147.219) has joined #tux3 2008-10-04 07:55 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 08:29 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 09:11 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 09:44 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 11:50 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 11:56 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 12:26 ACTION is going to start on his SD to SF trip soon 2008-10-04 12:30 -!- ajonat(~ajonat@190.48.116.103) has joined #tux3 2008-10-04 12:35 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-04 12:53 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 12:54 -!- hubar(~chatzilla@p67-47.acedsl.com) has joined #tux3 2008-10-04 13:04 -!- hubar(~chatzilla@p67-47.acedsl.com) has left #tux3 2008-10-04 13:12 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 13:15 -!- paola(~paola@ppp-23-17.20-151.libero.it) has joined #tux3 2008-10-04 13:46 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 13:51 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 13:54 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 13:55 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-04 14:11 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-04 14:42 -!- paola(~paola@ppp-23-17.20-151.libero.it) has left #tux3 2008-10-04 14:52 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-04 15:24 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 17:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 18:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-04 18:35 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 18:41 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 20:48 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-04 20:49 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 20:50 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 22:13 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-04 23:20 -!- ajonat(~ajonat@190.48.116.103) has joined #tux3 2008-10-04 23:40 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-05 00:13 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-05 00:18 -!- RazvanM2(~razvanm2@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-05 01:59 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-05 03:43 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-05 04:25 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-05 04:50 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-05 05:10 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-05 07:44 -!- kbingham(~kbingham@92.1.55.205) has joined #tux3 2008-10-05 09:44 -!- kbingham(~kbingham@92.23.22.99) has joined #tux3 2008-10-05 10:13 -!- pgquiles(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-05 10:40 -!- kbingham(~kbingham@92.21.131.250) has joined #tux3 2008-10-05 10:45 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-05 10:48 -!- Bobby(~Bobby@122.162.70.237) has joined #tux3 2008-10-05 10:55 hey guys 2008-10-05 10:55 hello all 2008-10-05 11:03 flips, there? 2008-10-05 11:03 got the extent code working? 2008-10-05 11:03 i think he does 2008-10-05 11:04 hey tim_dimm 2008-10-05 11:04 hey dude 2008-10-05 11:05 how's it going? 2008-10-05 11:05 hmm 2008-10-05 11:05 not great 2008-10-05 11:05 what's not great? 2008-10-05 11:06 been trying to get working on tux3 since long 2008-10-05 11:06 what's the block? 2008-10-05 11:08 not sure of the concepts 2008-10-05 11:08 and no time :( 2008-10-05 11:08 its not easy, and it does take time to digest 2008-10-05 11:10 i'm just now learning C, so my role has been evangelism so far 2008-10-05 11:10 hmm 2008-10-05 11:10 im going to put more work into this 2008-10-05 11:10 learning the concepts and the code 2008-10-05 11:11 is tux3 U helping? 2008-10-05 11:11 yeah, i just go through the logs 2008-10-05 11:11 the timing doesn't exactly match :( 2008-10-05 11:11 where r u located? 2008-10-05 11:11 india 2008-10-05 11:11 where in india? 2008-10-05 11:11 delhi 2008-10-05 11:11 its all the same time zone 2008-10-05 11:12 havent been tere, but have been to chennai 2008-10-05 11:12 oh 2008-10-05 11:12 ok 2008-10-05 11:12 whr u frm/ 2008-10-05 11:12 LA 2008-10-05 11:12 los angeles 2008-10-05 11:12 why chennai? 2008-10-05 11:12 ya. im familiar 2008-10-05 11:13 i did visual effects for a film there 2008-10-05 11:13 ohk 2008-10-05 11:13 frameflow, got bought by sony imageworks 2008-10-05 11:14 ohk 2008-10-05 11:14 hey tim_dimm, g2g, ttyl 2008-10-05 11:14 cool, l8tr 2008-10-05 13:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 14:17 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 14:41 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 17:25 -!- kbingham(~kbingham@92.9.150.238) has joined #tux3 2008-10-05 17:57 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 19:04 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 21:57 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 23:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-05 23:23 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-05 23:24 hey all 2008-10-05 23:39 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-06 00:12 flips: creating sourceforge page for tux3 2008-10-06 00:12 so that people with no mercurial can access it 2008-10-06 00:12 i will take the responsibility to keep the project there in sync with mercurial 2008-10-06 00:12 i hope this is ok with you 2008-10-06 00:33 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-06 03:14 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-06 04:05 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-06 04:27 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-06 05:23 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-06 05:52 -!- tux3bot(~tux3bot@yzf.shapor.com) has joined #tux3 2008-10-06 05:53 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-06 05:54 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-06 07:30 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-06 09:19 -!- rollen(~none@195.129-243-81.adsl-dyn.isp.belgacom.be) has joined #tux3 2008-10-06 10:07 -!- Bobby(~Bobby@122.162.68.177) has joined #tux3 2008-10-06 10:09 -!- Bobby(~Bobby@122.162.68.177) has joined #tux3 2008-10-06 10:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-06 10:37 -!- Bobby(~Bobby@122.162.68.177) has joined #tux3 2008-10-06 10:37 hey all 2008-10-06 10:37 hey Bobby 2008-10-06 10:38 hello tim_dimm 2008-10-06 10:38 you r now help? 2008-10-06 10:38 heh 2008-10-06 10:39 -!- Bobby_(~Bobby@122.162.68.177) has joined #tux3 2008-10-06 10:39 forgot to log off from office :D 2008-10-06 10:40 it was not accepting my nick changes 2008-10-06 10:40 it happens 2008-10-06 10:51 tim_dimm, which movie did u work on?? 2008-10-06 10:51 here in chennai? 2008-10-06 10:51 my movie, midgetman 2008-10-06 10:51 was never completed 2008-10-06 10:51 hmm 2008-10-06 10:51 which firm? 2008-10-06 10:52 frameflow was purchased by sony 2008-10-06 10:52 ohk 2008-10-06 10:52 indy 2008-10-06 10:52 what are u doing now? 2008-10-06 10:52 i mean job 2008-10-06 10:53 after midgetman, I was doing color correction for film 2008-10-06 10:53 ok 2008-10-06 10:53 otherwise known as DI (digital intermiediate) 2008-10-06 10:53 and I learned about a ssd startup - violin memory 2008-10-06 10:53 and believed that could accelerate certain post production processes 2008-10-06 10:54 so I made a few introductions, then found myself employed by violin 2008-10-06 10:54 hmm 2008-10-06 10:54 yeah, violin.. that 16gb ram disk... 2008-10-06 10:54 which lead me to MetaRAM after violin didn't get market traction 2008-10-06 10:54 512GB ram disk 2008-10-06 10:54 oh 2008-10-06 10:54 awesome.. 2008-10-06 10:55 metaram? 2008-10-06 10:55 I was hoping they'd release flash 2008-10-06 10:55 MetaRAM makes 8GB DDR2 rDIMMs and 8GB and 16GB DDR3 rDIMMS 2008-10-06 10:55 and I'm doing bizdev for them 2008-10-06 10:56 hmm 2008-10-06 10:56 started just in media + entertaiment, but now cover all 2008-10-06 10:56 u seem to be having fun :) 2008-10-06 10:56 memory market is tough right now 2008-10-06 10:56 all markets are tough :) 2008-10-06 10:56 now* 2008-10-06 10:56 all the big memory fabs are losing money 2008-10-06 10:56 yeah 2008-10-06 10:57 especially with the recession and all 2008-10-06 10:57 is india in a recession too? 2008-10-06 10:57 thought your economy was growing fast 2008-10-06 10:57 hmm 2008-10-06 10:57 the market reached its 2 year low point today 2008-10-06 10:57 US market affects all other markets 2008-10-06 10:58 of course 2008-10-06 10:58 foreign inflows are drying up.. 2008-10-06 10:58 people are pulling out of markets left and right 2008-10-06 10:58 how is that affecting IT spending? 2008-10-06 10:59 IT spending is also being cut overseas 2008-10-06 10:59 and that will affect the IT companies here which mostly do outsourcing work... 2008-10-06 10:59 speaking of work, what do u do? 2008-10-06 11:00 i work in mentor graphics 2008-10-06 11:00 ever heard of it? 2008-10-06 11:00 name is familiar 2008-10-06 11:00 got a website? 2008-10-06 11:00 and no, its not a graphics company :) 2008-10-06 11:00 yup 2008-10-06 11:00 mentor.com 2008-10-06 11:00 electronic design 2008-10-06 11:01 yup 2008-10-06 11:01 thought I'd heard it 2008-10-06 11:01 what's your role at mentor? 2008-10-06 11:02 im member, technical staff 2008-10-06 11:02 how'd you hear about tux3? 2008-10-06 11:02 hmm 2008-10-06 11:02 im an kernel enthusiast 2008-10-06 11:03 like operating systems in general 2008-10-06 11:03 lkml ? 2008-10-06 11:03 been reading about os for 1-2 years 2008-10-06 11:03 yup 2008-10-06 11:03 i was getting nowhere without a project 2008-10-06 11:03 so finally decided to take the plunge 2008-10-06 11:04 and how's that going? 2008-10-06 11:04 not great :( 2008-10-06 11:04 my day job is like my entire day job 2008-10-06 11:04 dont get the time exactly 2008-10-06 11:04 to devote to tux3 2008-10-06 11:04 so its a question of time then 2008-10-06 11:05 yup 2008-10-06 11:05 i need to make up more of it 2008-10-06 11:05 well, its cool you're here though 2008-10-06 11:05 hmm, im trying :) 2008-10-06 11:06 I've got the benefit of being able to hear about it straight from flips 2008-10-06 11:06 I met him at linux world over a year ago 2008-10-06 11:06 he tested violin when he was at google 2008-10-06 11:06 oh! cool 2008-10-06 11:07 he isn't at google anymore? 2008-10-06 11:07 nope 2008-10-06 11:07 then/ 2008-10-06 11:07 tux3 100% 2008-10-06 11:07 hmm, ok 2008-10-06 11:07 tux3 96%, skate 4% 2008-10-06 11:07 hehe 2008-10-06 11:07 lol 2008-10-06 11:07 he's gotten quite good 2008-10-06 11:07 just started skating in feb 2008-10-06 11:07 i never understood what sk8 oclock meant 2008-10-06 11:08 I do downhill 2008-10-06 11:08 http://homepage.mac.com/timothyhuber/downhill/iMovieTheater68.html 2008-10-06 11:08 grt 2008-10-06 11:08 plus skate on the boardwalk in venice and santa monica 2008-10-06 11:08 venice?? 2008-10-06 11:08 venice is in los angeles 2008-10-06 11:09 hehe 2008-10-06 11:09 i was thinking about the other venice... :) 2008-10-06 11:09 wow 2008-10-06 11:09 venice was designed in early 1900 by a developer to resemble venice, italy 2008-10-06 11:09 cool videoo 2008-10-06 11:10 is that you in that video? 2008-10-06 11:10 http://maps.google.com/maps?f=q&hl=en&geocode=&q=Venice,+CA+90291&ie=UTF8&z=14&iwloc=addr 2008-10-06 11:10 yeah, I'm in red 2008-10-06 11:10 and the other? 2008-10-06 11:10 george merkert, three time world champ 2008-10-06 11:10 for downhill inline 2008-10-06 11:10 woho 2008-10-06 11:10 nicee 2008-10-06 11:11 and he is a visual effects producer 2008-10-06 11:11 total recall, starship troopers and ~20 others 2008-10-06 11:11 hmm 2008-10-06 11:11 nice 2008-10-06 11:11 awesome.. 2008-10-06 11:11 scott peer is our other big dh inline guy 2008-10-06 11:11 works at the JPL 2008-10-06 11:12 Cassini navigation software is his design 2008-10-06 11:12 aren't there any vehicles on the road? 2008-10-06 11:13 heh 2008-10-06 11:13 yeah, but not many 2008-10-06 11:13 we pick roads with little traffic 2008-10-06 11:13 it looked like you were going pretty fast 2008-10-06 11:13 its remote, and very curvy 2008-10-06 11:13 hmm 2008-10-06 11:13 in that video, about 35mph 2008-10-06 11:13 we top out in the mid-high 50's 2008-10-06 11:14 there's one video on there where I was wearing a gps 2008-10-06 11:14 hmm 2008-10-06 11:14 and at the end of the run it shows 51.5 2008-10-06 11:14 hmm, cool 2008-10-06 11:14 its the mammoth video 2008-10-06 11:15 seeing it 2008-10-06 11:17 shapor does downhill with us now 2008-10-06 11:17 hmm, you all stay in LA?? 2008-10-06 11:17 flips won't do it with us 2008-10-06 11:17 yeah 2008-10-06 11:17 why? 2008-10-06 11:17 la is pretty cool 2008-10-06 11:17 there's plenty to do here 2008-10-06 11:17 naah, i mean why doesn't flips do downhill with you? :) 2008-10-06 11:17 and there are mountains right here 2008-10-06 11:17 oh, 2008-10-06 11:18 :D 2008-10-06 11:18 doesn't want to go that fast 2008-10-06 11:18 ohk 2008-10-06 11:18 he's heard about our crashes 2008-10-06 11:19 <-never gotten hurt, but crash a few times/year 2008-10-06 11:19 hehe 2008-10-06 11:19 that's what body armor is for 2008-10-06 11:20 watchin mammoth video... which one is u? 2008-10-06 11:21 i'm carrying the camera 2008-10-06 11:21 :) 2008-10-06 11:21 ohk 2008-10-06 11:22 tim on stunt, crouching skater, tuna canyon, tuna bottom section, brake testing on little t 2008-10-06 11:22 man, we've gotten way off topic 2008-10-06 11:22 :-) 2008-10-06 11:23 hehe 2008-10-06 11:23 yeah 2008-10-06 11:27 i am getting plenty of errors using make tests 2008-10-06 11:27 looking into them... 2008-10-06 11:41 tim_dimm, you looked around the tux3 code? 2008-10-06 11:41 yup 2008-10-06 11:41 which part are u familiar with? 2008-10-06 11:44 feeding my daughter...gimme a few to respond 2008-10-06 11:45 oh! sorry, please go on 2008-10-06 11:45 goin for a smoke, be back soon 2008-10-06 11:50 -!- Bobby_(~Bobby@122.162.68.179) has joined #tux3 2008-10-06 11:52 -!- Bobby_(~Bobby@122.162.68.179) has joined #tux3 2008-10-06 11:57 -!- Bobby_(~Bobby@122.162.68.179) has joined #tux3 2008-10-06 11:58 back 2008-10-06 11:59 -!- pranith_(~bobby@122.162.68.179) has joined #tux3 2008-10-06 12:17 hmm 2008-10-06 13:10 -!- zbrown(~rufius@208.64.37.45) has joined #tux3 2008-10-06 13:48 pranith: yeah looks like the extent stuff broke something in dleaf.c.. not too suprising 2008-10-06 14:55 folks 2008-10-06 16:23 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-06 17:26 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-06 21:57 -!- flipz(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-06 22:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-06 22:13 -!- daniel(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-06 23:24 hey all 2008-10-06 23:25 anyone here? 2008-10-06 23:27 hi pranith 2008-10-06 23:27 wha 2008-10-06 23:28 what's up? 2008-10-06 23:28 hey flipz, u back!! 2008-10-06 23:28 i thought u were out today too 2008-10-06 23:29 I'm still on the road 2008-10-06 23:29 oh 2008-10-06 23:29 seems like ur extent stuff broke a lot of things... 2008-10-06 23:29 trying to find them out 2008-10-06 23:29 heh 2008-10-06 23:29 have fun reading the code 2008-10-06 23:29 messages on the mailing list? 2008-10-06 23:29 :) 2008-10-06 23:29 yeah, i will 2008-10-06 23:30 please 2008-10-06 23:30 I'll have a read when I see something 2008-10-06 23:30 ohkies 2008-10-06 23:30 ill post the errors 2008-10-06 23:30 thanks 2008-10-06 23:30 how do i turn the trace function on? 2008-10-06 23:31 #define trace trace_on 2008-10-06 23:31 instead of #define trace trace_off 2008-10-06 23:31 hmm 2008-10-06 23:31 ohk 2008-10-06 23:31 thnx 2008-10-06 23:32 flipz: its defined as trace on... 2008-10-06 23:32 19#ifndef trace 20#define trace trace_on 21#endif 2008-10-06 23:33 unless it was already defined 2008-10-06 23:33 hmm, ok. trace has been defined somewhere else... 2008-10-06 23:33 ACTION searching 2008-10-06 23:34 look for #include "something.c" 2008-10-06 23:35 ok 2008-10-06 23:35 trace.h 2008-10-06 23:39 shapor: mind updating the bitbucket mirror?? 2008-10-06 23:40 it's not exactly mirroring stuff now 2008-10-07 00:10 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-07 01:49 hey 2008-10-07 01:51 hello bh 2008-10-07 01:56 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-07 02:05 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-07 02:17 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-07 02:38 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-07 02:42 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-07 02:50 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-10-07 05:15 hey all 2008-10-07 05:15 real life is boring 2008-10-07 08:00 -!- Kirantpatil(~kiran@122.167.195.60) has joined #tux3 2008-10-07 08:01 -!- Kirantpatil(~kiran@122.167.195.60) has left #tux3 2008-10-07 09:44 greetings earthlings 2008-10-07 09:44 pranisout: what makes you think bitbucket isn't updating? 2008-10-07 09:47 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-07 10:01 -!- pranith(~Bobby@122.162.71.155) has joined #tux3 2008-10-07 10:01 hello 2008-10-07 10:01 anyone here 2008-10-07 10:06 -!- pranith(~Bobby@122.162.71.155) has joined #tux3 2008-10-07 10:07 hello 2008-10-07 10:25 hi 2008-10-07 10:42 shapor, hello 2008-10-07 10:57 you were mentioning the bitbucket mirror yesterday 2008-10-07 10:57 it looks up to date to me 2008-10-07 11:05 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-07 11:06 shapor, hmm 2008-10-07 11:07 yup, my mistake 2008-10-07 11:07 sorry 2008-10-07 11:14 -!- Bobby_(~Bobby@122.162.69.183) has joined #tux3 2008-10-07 11:20 its just that flips has been out of commission so there aren't any updates ;) 2008-10-07 11:29 hmm 2008-10-07 11:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-07 11:36 hey tim_dimm 2008-10-07 11:36 hey pranith 2008-10-07 12:59 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-10-07 15:59 -!- flipz(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-07 16:45 -!- pravin(~pravin@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-07 17:10 -!- pravin(~pravin@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-07 17:32 flipz: ping 2008-10-07 17:33 hi tim_dimm 2008-10-07 17:33 meet me on the bat channel 2008-10-07 17:37 just a sec 2008-10-07 17:37 k 2008-10-07 18:30 -!- garns(~garns@marvin.cs.uni-dortmund.de) has joined #tux3 2008-10-07 18:42 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-07 18:57 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-07 19:06 -!- flipz(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-07 19:37 -!- daniel_(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-07 19:43 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-07 19:45 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has left #tux3 2008-10-07 20:12 -!- Kirantpatil(~kiran@122.167.194.225) has joined #tux3 2008-10-07 20:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-07 20:49 -!- Kirantpatil(~kiran@122.167.194.225) has left #tux3 2008-10-07 21:36 -!- ajonat(~ajonat@190.48.122.100) has joined #tux3 2008-10-07 22:13 hey all 2008-10-07 22:23 hi prantih 2008-10-07 22:23 think you typoed your nick 2008-10-07 22:25 flipz, hi 2008-10-07 22:26 yeah, :( 2008-10-07 22:26 hehe 2008-10-07 22:26 u still on road? 2008-10-07 22:41 still 2008-10-07 22:41 pranith, were you going to report a bug to the mailing list? 2008-10-07 22:51 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-07 23:04 flipz, yup.. doing it now... 2008-10-07 23:04 flipz, was thinking of debugging it myself 2008-10-07 23:05 flipz, let me share it with you 2008-10-07 23:10 good to post whether you debug it yourself or not 2008-10-07 23:57 flipz, anything? 2008-10-08 00:08 pranith, I see the same valgrind issues 2008-10-08 00:10 yeah i noticed those too 2008-10-08 00:11 what is the actual bug? 2008-10-08 00:12 good question 2008-10-08 00:12 I see the very first valgrind complaint does not really matter, it is assigning an invalid groups pointer, but that pointer will be ignored because the code relies on knowing groups = 0 there, and should not use the groups pointer in that case 2008-10-08 00:12 the group pointer I meant 2008-10-08 00:13 I mean, what is the symptom? Just that make tests stops there? 2008-10-08 00:14 the next interesting valgrind complaint at line 419 is just a printf 2008-10-08 00:17 the issue at 469 is, I didn't set dleaf->free or ->used at the end of the leaf packing operations 2008-10-08 00:18 dleaf_dump is complaining about an unitialized count field, which sounds bad but the leaf seems perfectly ok when printed out 2008-10-08 00:19 and there is a leak of 1K in the test 2008-10-08 00:19 shapor, that should be enough for you to kill the all off 2008-10-08 00:20 the first one above can be fixed by not doing the two offending assignments if leaf->groups is zero 2008-10-08 00:21 or maybe better, assigning the two pointers in the struct dwalk to NULL 2008-10-08 00:31 I won't be able to fix these for another couple days, anybody else is welcome 2008-10-08 00:43 flipz: no its not copying the pointer at 641 2008-10-08 00:43 the first error is a problem 2008-10-08 00:48 flipz, is the leak genuine? 2008-10-08 00:48 don't know 2008-10-08 00:48 try putting in exit(1) instead of return 2008-10-08 00:53 shapor, walk->group at 641 is valid but *walk->group is not 2008-10-08 00:54 so right, the pointer is fine, it's the object that's invalid 2008-10-08 00:55 ACTION goes for some zzzs 2008-10-08 01:14 flipzzz: http://arstechnica.com/journals/linux.ars/2008/10/07/wizbit-a-linux-filesystem-with-distributed-version-control 2008-10-08 01:14 very interesting 2008-10-08 01:27 wow shiny 2008-10-08 01:28 uses gnome vfs I think I saw 2008-10-08 01:35 yeah 2008-10-08 01:44 ok really sleeping this time 2008-10-08 02:42 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-08 05:05 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 05:10 hey tim_dimm 2008-10-08 05:10 hey pranith 2008-10-08 05:10 i'm up feeding my twins 2008-10-08 05:10 oh, i thought u had a girl 2008-10-08 05:10 lucky u 2008-10-08 05:10 boy and a girl 2008-10-08 05:11 cool 2008-10-08 05:12 how old? 2008-10-08 05:12 4 weeks 2008-10-08 05:12 quite a handful 2008-10-08 05:13 oh, cool 2008-10-08 05:42 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-08 07:06 -!- kbingham(~kbingham@cvs.mpc-ogw.co.uk) has joined #tux3 2008-10-08 08:39 -!- Kirantpatil(~kiran@122.167.197.37) has joined #tux3 2008-10-08 08:39 -!- Kirantpatil(~kiran@122.167.197.37) has left #tux3 2008-10-08 08:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 10:27 hey all 2008-10-08 11:27 -!- daniel_(~daniel@d75-157-56-124.bchsia.telus.net) has joined #tux3 2008-10-08 13:38 folks 2008-10-08 13:59 the mails about encoding of extent information: To me, it sounds like a variant of a buddy-system 2008-10-08 14:40 what's that ? 2008-10-08 15:08 data, right, and he mentions knuth too 2008-10-08 15:09 but it is an original application 2008-10-08 15:10 question is: is the per pointer bit saving more than the extra pointers required? 2008-10-08 15:10 It is quite possible that it is 2008-10-08 15:10 need to spreadsheet that 2008-10-08 15:26 worth giving it a shot anyway 2008-10-08 15:26 g'night 2008-10-08 15:47 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 17:59 -!- ajonat(~ajonat@190.48.94.249) has joined #tux3 2008-10-08 18:36 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 18:51 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 20:03 -!- Kirantpatil(~kiran@122.167.211.254) has joined #tux3 2008-10-08 21:20 -!- natalie(~natalie@72.14.228.1) has joined #tux3 2008-10-08 21:34 -!- natalie(~natalie@72.14.228.1) has left #tux3 2008-10-08 21:35 -!- Kirantpatil(~kiran@122.167.192.198) has joined #tux3 2008-10-08 21:35 -!- Kirantpatil(~kiran@122.167.192.198) has left #tux3 2008-10-08 22:25 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-08 23:21 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 23:32 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-08 23:53 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-08 23:53 hey all 2008-10-09 00:00 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-09 00:40 reading dleaf.c 2008-10-09 00:40 will try to run some tests today.. 2008-10-09 00:40 hoping to find atleast one bug :D 2008-10-09 00:41 to make myself useful here 2008-10-09 00:48 pranith, I spotted a couple of bugs 2008-10-09 00:48 will put in fixes pretty soon 2008-10-09 00:49 stupid things 2008-10-09 00:49 changed the interface to dleaf_dump 2008-10-09 00:49 will change it back I think 2008-10-09 00:49 hmm 2008-10-09 00:49 ohk 2008-10-09 00:49 just running the tests is already useful 2008-10-09 00:49 hmm 2008-10-09 00:49 stuff like tux3 mkfs 2008-10-09 00:50 ? 2008-10-09 00:50 make tux3? 2008-10-09 00:50 and echo foo | tux3 write testdev testfile 2008-10-09 00:50 you do make tux3 && ./tux3 mkfs testdev 2008-10-09 00:50 for example 2008-10-09 00:50 we need some docs around now 2008-10-09 00:50 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 00:51 sry, dc 2008-10-09 00:52 flips, u back? 2008-10-09 02:44 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 03:11 -!- kbingham(~kbingham@cvs.mpc-ogw.co.uk) has joined #tux3 2008-10-09 03:22 back now 2008-10-09 03:22 next move is to get some sleep 2008-10-09 04:24 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 04:24 hey all 2008-10-09 04:55 -!- Bobby_(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 04:55 -!- Bobby_(~Bobby@122.162.67.161) has left #tux3 2008-10-09 05:03 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-09 07:34 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 07:49 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 08:32 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 08:34 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 09:04 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 09:15 hello 2008-10-09 09:23 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-09 09:23 ACTION is back :P 2008-10-09 09:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 09:40 hey pranith 2008-10-09 10:04 morning tim_dimm 2008-10-09 10:04 morning flips 2008-10-09 10:04 got your sleep? 2008-10-09 10:04 some of it 2008-10-09 10:04 you? 2008-10-09 10:04 feeling rejuvenated? 2008-10-09 10:04 hah 2008-10-09 10:04 you are hilarious dude 2008-10-09 10:05 ;-) 2008-10-09 10:27 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-09 11:15 there we go, tux3 command should function again 2008-10-09 11:16 dleaf_dump interface was changed and commented out of the ops, caused seg fault 2008-10-09 11:16 now put back as it was 2008-10-09 11:16 got to fix the valgrind issues, none of which seem serious 2008-10-09 11:17 then there is a real bug in the new extents stuff that make tux3 read seg fault 2008-10-09 11:17 tux3 write seems to work ok 2008-10-09 11:17 then time to write a mail on atomic commit 2008-10-09 11:18 and there is tux3 u tonight 2008-10-09 11:18 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-09 11:19 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: friends of grab_cache_page " 2008-10-09 11:19 -!- flips changed mode/#tux3 -> -o flips 2008-10-09 12:43 shapor, found the valgrind issue, it was real 2008-10-09 12:53 -!- ajonat(~ajonat@190.48.94.249) has joined #tux3 2008-10-09 12:58 make tests compile without valgrind errors now 2008-10-09 12:59 bunch of little things, real bugs 2008-10-09 12:59 -!- alaine(~alaine@kevbroadley.demon.co.uk) has joined #tux3 2008-10-09 12:59 someone that got owned on msn .. haha makes me ROFL http://www.tibix.eu/include/index.php 2008-10-09 13:01 I wonder if it is actually good to have make mkfs do its thing in /tmp 2008-10-09 13:02 makes for more typing running tests 2008-10-09 13:02 but does keep the local source free of big loopback volumes 2008-10-09 13:03 good call on the autokill 2008-10-09 13:43 i like bitbucket's diffs 2008-10-09 13:43 http://www.bitbucket.org/shapor/tux3/changeset/1b6cf87c7234/ 2008-10-09 13:43 make tests runs successfully now 2008-10-09 13:44 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 14:03 tux3 read has a segfault in the new extents code 2008-10-09 14:03 true, bitbucket diffs are nice 2008-10-09 14:05 for some reason the bitbucket mirror is lagged 77 minutes 2008-10-09 14:05 oh 2008-10-09 14:05 sorry 2008-10-09 14:05 because I used your url ;) 2008-10-09 14:24 this has xattr bugs: make inode && ./inode foodev 2008-10-09 14:24 there should be no xattrs but the inode table listing thinks there are 2008-10-09 14:24 something stupid 2008-10-09 15:03 shapor: trac does a similar output and I think they are using some kind of package for it 2008-10-09 15:03 butit does look nice 2008-10-09 17:14 Tonight on tux3 u, provided I can stay awake that long, we will be looking at the relationship between buffers and pages in the page cache 2008-10-09 17:21 2.6.27 is out with lockless page cache 2008-10-09 17:21 changes at the heart of tux3-u :) 2008-10-09 17:27 hey flips 2008-10-09 17:28 shapor: it's been in -rt forever 2008-10-09 17:28 and is a major pain in the ass regarding scalability 2008-10-09 17:33 bh: oh? 2008-10-09 17:34 howso 2008-10-09 19:13 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 19:55 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-09 20:00 who's here tonight? 2008-10-09 20:00 ACTION is 2008-10-09 20:00 Ral will also be. 2008-10-09 20:00 saw shapor not too long ago 2008-10-09 20:01 let's start in two places this time 2008-10-09 20:01 http://lxr.linux.no/linux+v2.6.26.6/include/linux/mm_types.h#L36 <- struct page 2008-10-09 20:02 http://lxr.linux.no/linux+v2.6.26.6/include/linux/buffer_head.h#L60 <- struct buffer_head 2008-10-09 20:03 struct page is the thing we use as a handle for a physical page 2008-10-09 20:03 it has an object count so we known when to release the page 2008-10-09 20:03 _mapcount is something new I haven't really looked at 2008-10-09 20:04 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-09 20:04 it has a private field for whatever the owner, whoever alloced the page usually, wants to put there 2008-10-09 20:04 quick q: what is 'ptes' and 'mms'? 2008-10-09 20:04 in practice, that is usually a list of buffers attached to the page 2008-10-09 20:05 pte is a page table entry 2008-10-09 20:05 mm is a memory management context 2008-10-09 20:05 that is, an address space 2008-10-09 20:05 it's a struct mm 2008-10-09 20:05 each process has one, threads share one 2008-10-09 20:06 the struct page also used to have a lock 2008-10-09 20:06 seems to have gone missing now 2008-10-09 20:06 so we don't have one lock per page 2008-10-09 20:06 probably replaced by a hashed lock 2008-10-09 20:07 need to chase that down 2008-10-09 20:07 it has a very important field: index 2008-10-09 20:07 this is the position of the page within a page cache radix tree, if it is in one 2008-10-09 20:08 and that is the tie to vfs 2008-10-09 20:08 the page also has a pointer to the mapping it is in 2008-10-09 20:08 so mapping + index => retrieve the page 2008-10-09 20:09 and we can remove the page from a mapping by because of those fields recorded in it 2008-10-09 20:10 there is also the lru link, which is gives the vmm an idea of which page should be recovered when cache memory gets full 2008-10-09 20:10 over to buffer_head 2008-10-09 20:11 also has a flags and a count, though the buffer flags is named b_state for no particular reason 2008-10-09 20:11 has a pointer to the page the buffer head is attached to, on which the data belonging to the buffer is stored 2008-10-09 20:12 we figure out where on the page the buffer data is stored by looking at the low bits of the index, I think... 2008-10-09 20:12 we will come back to that and check it 2008-10-09 20:13 the buffer also points at a block device b_bdev, but this field is redundant now 2008-10-09 20:13 because we have buffer->page->mapping->... bdev 2008-10-09 20:14 there is an end_io function like the endio for a bio 2008-10-09 20:14 serves the same purpose, and is now also largely redundant 2008-10-09 20:15 assoc_buffers is a crude scheme for flushing file metadata along with data for primitive filessystems like ext2 that let the vfs do all their work for them 2008-10-09 20:16 we also don't see a lock in the buffer_head itself 2008-10-09 20:16 though for both pages and buffers, locking is a huge element of how they are used 2008-10-09 20:18 in the case of buffers, we spin on one of the state bits 2008-10-09 20:18 we will go find that code later also 2008-10-09 20:20 it's __lock_buffer 2008-10-09 20:20 defined somewhere lxr can't find 2008-10-09 20:21 (I'd thought we were meeting at 9pm today...) 2008-10-09 20:22 it should be in buffer.c 2008-10-09 20:22 because we did last time? 2008-10-09 20:22 should be 2008-10-09 20:22 http://lxr.linux.no/linux+v2.6.26.5/fs/buffer.c#L70 2008-10-09 20:23 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L70 2008-10-09 20:23 thanks 2008-10-09 20:23 ok, see it's a lock that spins on a bit 2008-10-09 20:23 another quick q: how is the page structures managed? Is there a huge array somewhere? Or some lists? 2008-10-09 20:23 not very efficient 2008-10-09 20:23 a huge array 2008-10-09 20:23 very simple/crude 2008-10-09 20:24 and sometimes a big problem because of the size of that array 2008-10-09 20:24 let's have a look at lock_page for comparison 2008-10-09 20:25 searching... 2008-10-09 20:25 yeah coming up empty 2008-10-09 20:25 been worked on lately 2008-10-09 20:25 http://lxr.linux.no/linux+v2.6.26.5/include/linux/pagemap.h#L167 2008-10-09 20:25 :) 2008-10-09 20:25 2.6.27 has no index 2008-10-09 20:26 right 2008-10-09 20:26 I should have mentioned 2008-10-09 20:26 2.6.26.6 2008-10-09 20:26 oh... I'm still on .5 :P 2008-10-09 20:26 http://lxr.linux.no/linux+v2.6.26.6/include/linux/pagemap.h#L167 :D 2008-10-09 20:26 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L599 <- we see that the page lock is another bit spin lock 2008-10-09 20:27 does might_sleep do something or is that a statement for code testing? 2008-10-09 20:27 the closer you look at buffer_heads and struct pages, the more they are quite similar to each other 2008-10-09 20:27 might_sleep will generate a kprint warning if it is called under a spinlock 2008-10-09 20:28 if you have that debug option turned on 2008-10-09 20:28 ok, now we have some slight familiarity with those two, let's go look at a place where they are used together 2008-10-09 20:28 like block_read_full_page 2008-10-09 20:30 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2093 <- block_read_full_page 2008-10-09 20:31 the page may or may not have a list of buffers attached to it 2008-10-09 20:31 if the blocksize is same as page size, the list will have one bufer 2008-10-09 20:31 otherwise some binary number of buffers 2008-10-09 20:32 the first thing _full_page does is put a list of buffer (heads) on the page if it has none 2008-10-09 20:33 then it loops over the buffer list (again usually one buffer) to find any buffers not uptodate 2008-10-09 20:33 if it just put the buffers on the page, it should already know of course 2008-10-09 20:34 any buffer that is not up to date, it makes a call into the filesystem, get_block 2008-10-09 20:34 which is a callback passed to it by the filesystem 2008-10-09 20:34 because block_read_full_page is always called from filesystem code 2008-10-09 20:35 it is just a library helper to make it easy to do IO on a page 2008-10-09 20:35 easy, but pretty sloppy and executing way too much code 2008-10-09 20:36 which can be masked by a slow disk, but not entirely 2008-10-09 20:37 see further down, there is some coupling of the buffer flags and the page flags in that if all buffers are up to date, the page is set up to date as well 2008-10-09 20:37 does this mean we still have a page cache, even if the file system block size is not page sized? and this is what converts from sub-page sized buffers to the page used by the page cache? 2008-10-09 20:37 this is where we handle the impedence mismatch between page size and block size, if that answers your question 2008-10-09 20:38 it page cache is indexed by pages, but filesystems like ext3 treat it as if it was indexed by buffers 2008-10-09 20:38 (perhaps) 2008-10-09 20:38 see ext3_bread 2008-10-09 20:39 this mismatch is a huge source of complexity in vfs and mm, and a nasty source of bugs 2008-10-09 20:39 by the time we've done the tux3 kernel port, everybody will know exactly what I'm talking about 2008-10-09 20:40 ok, buffer locking is a little counterintuitive 2008-10-09 20:40 we will keep a buffer locked while reading, but not while writing in general 2008-10-09 20:40 same with pages 2008-10-09 20:41 while reading from disk into the buffer, or from the buffer? 2008-10-09 20:41 disk into buffer 2008-10-09 20:41 always what is meant by "read" in here 2008-10-09 20:42 finally, the buffers that need reading are submitted via submit_bh 2008-10-09 20:42 doesn't it make sense to lock reads then? since that's when the memory content actually changes? 2008-10-09 20:42 which is just a simple wrapper on submit_bio 2008-10-09 20:42 which is an old friend of ours 2008-10-09 20:42 yes, we lock reads, not writes 2008-10-09 20:43 this is a horribly inefficient code path we're looking at 2008-10-09 20:44 and doesn't actually get used much any more, though there are cases that still trigger it 2008-10-09 20:44 I don't know what they are exactly, but again we will have a good idea after doing the port 2008-10-09 20:44 let's look at block_write_full_page while we are in here 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2093 2008-10-09 20:45 sorry 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1645 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1645 2008-10-09 20:45 right 2008-10-09 20:46 starts the same way 2008-10-09 20:46 as often happens in disk io 2008-10-09 20:46 but unfortunately, kernel takes little advantage of such symmetry 2008-10-09 20:47 we take care of zeroing a partial page exctending beyond end of file here 2008-10-09 20:47 "unmap_underlying_metadata" is a scary function to see here 2008-10-09 20:47 we'll leave that for another day 2008-10-09 20:48 see, we keep a state bit in the buffer to tell us whether we need to call the fs get_block method or not 2008-10-09 20:48 what is a non-blockdev mapping? 2008-10-09 20:48 page cache I guess 2008-10-09 20:48 fuinny terminology 2008-10-09 20:50 a slight fib, it seems we keep the buffer locked all the way through the write here 2008-10-09 20:50 the page however gets unlocked 2008-10-09 20:50 it is probably unnecessary to keep the buffer locked 2008-10-09 20:51 "redirty_page_for_writeback" is another scary visitor to see here 2008-10-09 20:51 hacking around various subtle loopholes in the vm design 2008-10-09 20:51 now let's see, where is the get_block call 2008-10-09 20:52 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1696 2008-10-09 20:53 by the way, each buffer has a size field, which is now redundant 2008-10-09 20:53 still hanging around 2008-10-09 20:54 we now find out the size from the block device pointed to by the buffer->page->mapping->inode->sb 2008-10-09 20:54 something like that 2008-10-09 20:55 (large number of indirects...) 2008-10-09 20:55 ah, what we are doing in unmapp_underlying_metadata is taking care of the lack of coherence between the inode page cache and the block device buffer cache 2008-10-09 20:56 there might have been a page sitting around in the buffer cache mapped to the same physical block 2008-10-09 20:56 see the bh->b_blocknr, that is what we call buffer->index in the tux3 userspace code 2008-10-09 20:57 but the usage is different here 2008-10-09 20:57 in kernel, this caches the _physical_ block the buffer is mapped to, even if the buffer is on a page in an inode page cache 2008-10-09 20:58 in the tux3 userspace code, the physical mapping is never cached 2008-10-09 20:58 and we use that field like kernel uses the page->index field, to know what the logical offset of the data is 2008-10-09 20:59 it turns out that caching the physical block pointer is pretty useless, almost always 2008-10-09 21:00 since nearly all writes will just write the buffer to the physical location once then address it out of cache after that 2008-10-09 21:00 it might save a get_block trip into the filesystem only for a rewrite 2008-10-09 21:01 the 9 oclock horn just sounded 2008-10-09 21:01 this was a pretty dry one today, no? 2008-10-09 21:01 but important 2008-10-09 21:01 seemed pretty hard core 2008-10-09 21:01 this little corner of the kernel will be visited frequently by anybody doing filesystem work 2008-10-09 21:01 yup 2008-10-09 21:02 ok, on tuesday we're going to get much more hard core 2008-10-09 21:02 is a lot of the complication around here cruft? or is it actually needed for performance and/or edge cases? 2008-10-09 21:02 because we want to use this buffer+page mechanism in ways it was not necessarily designed for 2008-10-09 21:02 major cruft, yes 2008-10-09 21:03 I hope to make it obsolete 2008-10-09 21:03 in due course 2008-10-09 21:03 but we're going to have to work with it for now 2008-10-09 21:03 changing core kernel to merge tux3 isn't really wise 2008-10-09 21:04 questions? 2008-10-09 21:04 otherwise my little girl wants to try out the new video game 2008-10-09 21:04 ACTION doesn't have any :( 2008-10-09 21:04 aaa... what video game is it? :P 2008-10-09 21:04 razvanm, try to read through some more of the block_ functions in buffer.c 2008-10-09 21:04 ACTION doesn't really know how to ask intelligent questions... 2008-10-09 21:04 bioshock 2008-10-09 21:05 my first free time will be after 22 :( 2008-10-09 21:05 I heard about bioshock... 2008-10-09 21:05 oh right 2008-10-09 21:05 where "some" means a little bit 2008-10-09 21:05 it's important to have the layout of the code seeping into you in the background 2008-10-09 21:06 bok 2008-10-09 21:06 a few minutes of looking, then you can go away and let it seep 2008-10-09 21:06 :-) 2008-10-09 21:07 when I get into the nitty gritty of atomic commit, this buffer cache interface gets really important 2008-10-09 21:07 this is where most of the action happens 2008-10-09 21:07 sorry 2008-10-09 21:07 page cache - with buffers attached 2008-10-09 21:07 and we will make it act like a buffer cache, as tux3 uses in user space 2008-10-09 21:07 ok, I'm out 2008-10-09 21:07 have a nice evening 2008-10-09 21:14 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has left #tux3 2008-10-09 21:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 22:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 22:47 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-09 23:01 hello 2008-10-09 23:01 anyone? 2008-10-09 23:48 hey 2008-10-09 23:48 hi pranith 2008-10-09 23:52 shapor: locking that data structure limited it about 2.5 processor scalability 2008-10-09 23:52 it's in old OLS papers if you want to read about it 2008-10-10 00:34 bh: which ds are u talking about? 2008-10-10 02:22 the page cache dictionary itself 2008-10-10 02:22 it's a well known problem 2008-10-10 02:59 -!- kbingham(~kbingham@cvs.mpc-ogw.co.uk) has joined #tux3 2008-10-10 03:12 flipsout: leaf->groups is being used without being initialized... 2008-10-10 03:12 are u assuming that it is 0 in dleaf_chop? 2008-10-10 03:15 so is group->count 2008-10-10 03:25 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-10 04:20 HI, I am new guy here. I wanted to learn about Tux3 and contribute to it.... any guidance will be greatly appreciated... 2008-10-10 04:54 hello less 2008-10-10 04:54 you still here? 2008-10-10 04:56 pranith : yes 2008-10-10 04:56 you know C? 2008-10-10 04:56 yep, i have done some kernel programming also.. 2008-10-10 04:56 good 2008-10-10 04:56 bt not very extensive.. 2008-10-10 04:57 did you go throught the design document? 2008-10-10 04:57 of tux3? 2008-10-10 04:57 http://shapor.com/tux3/shapor-tux3/doc/design.html 2008-10-10 04:57 yep, i have taken overview of it 2008-10-10 04:57 -!- FelipeS_(~Felipe@lawn-128-61-26-178.lawn.gatech.edu) has joined #tux3 2008-10-10 04:57 ok 2008-10-10 04:58 you can go through the code.. run the tests 2008-10-10 04:58 write your own tests 2008-10-10 04:58 implement the ideas 2008-10-10 04:58 ok, i will start with running the test's then... 2008-10-10 04:58 ok 2008-10-10 04:58 that should give me some more insite.. 2008-10-10 04:58 thanx.. :-) 2008-10-10 04:58 yup 2008-10-10 04:59 and please 2008-10-10 04:59 try to document what you learn 2008-10-10 04:59 we can compare notes later ;) 2008-10-10 04:59 ohh, sure... 2008-10-10 04:59 has anyone documented it before..?? 2008-10-10 04:59 so that i can get some help from it..? 2008-10-10 05:00 nope, the incode documentation is all that we have 2008-10-10 05:00 :) 2008-10-10 05:00 i've done some basic stuff 2008-10-10 05:00 but not online 2008-10-10 05:01 its in a nice notebook 2008-10-10 05:01 ok... 2008-10-10 05:01 will digitize it 2008-10-10 05:01 soon 2008-10-10 05:01 i will create my logs online only.. 2008-10-10 05:01 that's good 2008-10-10 05:01 may b it will help futute beginers.. 2008-10-10 05:01 :-) 2008-10-10 05:02 yeah 2008-10-10 05:02 you good at diagrams? 2008-10-10 05:02 may be you can make those 2008-10-10 05:02 i think they are very helpful and are missing now 2008-10-10 05:02 yep... bt not asci diagrams... 2008-10-10 05:03 any diagrams will do 2008-10-10 05:03 they take quite a lot time.. 2008-10-10 05:03 gif, png :) 2008-10-10 05:03 yeah 2008-10-10 05:03 thats the main problem 2008-10-10 05:03 yep.. i can do dat.. 2008-10-10 06:29 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-10 06:56 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-10 07:59 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-10 08:33 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-10 08:33 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-10 08:37 -!- ajonat(~ajonat@190.48.94.249) has joined #tux3 2008-10-10 09:19 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-10 09:21 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-10 09:47 -!- Kirantpatil(~kiran@122.167.208.110) has joined #tux3 2008-10-10 09:47 -!- Kirantpatil(~kiran@122.167.208.110) has left #tux3 2008-10-10 09:56 -!- ajonat_(~ajonat@190.48.107.122) has joined #tux3 2008-10-10 10:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-10 10:08 anyone check out UBIFS yet? 2008-10-10 10:08 http://www.linux-mtd.infradead.org/doc/ubifs.html 2008-10-10 10:14 Violin report on 2.6.27 : pci_dma_mapping_error(dma_handle) in 2.6.27 to pci_dma_mapping_error(pdev, dma_handle) 2008-10-10 10:23 morning flips 2008-10-10 10:23 good morning 2008-10-10 10:23 violin report? 2008-10-10 10:35 chatted with brad this morning 2008-10-10 10:35 was talking about 2.6.27 2008-10-10 10:35 my syntax was bad 2008-10-10 10:36 should have said "violin reports" 2008-10-10 12:29 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-10 14:55 folks 2008-10-10 16:02 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-10 16:57 sk8 oclock 2008-10-10 16:57 comes earlier in winter 2008-10-10 17:58 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-10 20:34 -!- ajonat(~ajonat@190.48.107.122) has joined #tux3 2008-10-10 22:17 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-10 23:18 -!- Bobby_(~Bobby@122.162.70.20) has joined #tux3 2008-10-10 23:31 hey all 2008-10-11 00:40 hi 2008-10-11 00:41 title of the next post is set 2008-10-11 00:41 "Thinking about Syncing" 2008-10-11 00:41 even better, there's an algorithm in mind 2008-10-11 01:50 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-11 02:47 -!- Bobby_(~Bobby@122.162.70.20) has joined #tux3 2008-10-11 05:35 -!- Bobby_(~Bobby@122.162.70.20) has joined #tux3 2008-10-11 06:20 -!- Bobby_(~Bobby@122.162.70.20) has joined #tux3 2008-10-11 06:40 -!- Bobby_(~Bobby@122.162.70.20) has joined #tux3 2008-10-11 07:49 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-11 08:13 -!- Bobby__(~Bobby@122.162.70.206) has joined #tux3 2008-10-11 09:18 -!- pgquiles(~pgquiles@247.Red-83-41-112.dynamicIP.rima-tde.net) has joined #tux3 2008-10-11 09:36 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-11 12:12 -!- Bobby__(~Bobby@122.162.70.206) has joined #tux3 2008-10-11 12:22 -!- pranihome(~Bobby@122.162.70.206) has joined #tux3 2008-10-11 12:34 http://lodge.glasgownet.com/2008/10/11/its-hammer-time/ <- hammer and tux3 2008-10-11 12:44 flips, are hammer and tux3 comparable? 2008-10-11 12:48 never went through what hammer was really... 2008-10-11 13:07 they use a similar method to do versioning 2008-10-11 13:08 significant differences too 2008-10-11 13:08 hammer has linear versioning, one long chain of snapshots at high granularity while tux3 does tree versioning with snapshots of snapshots 2008-10-11 13:35 -!- ajonat(~ajonat@190.48.107.122) has joined #tux3 2008-10-11 13:39 -!- bobby(~bobby@122.162.70.206) has joined #tux3 2008-10-11 13:39 amazing facts department: a freshly untarred 2.6.26.5 tree has 276008641 bytes of file data / 25715 files, average file size 10733 bytes 2008-10-11 13:40 since untarring kernel trees is one of the main linux fs benchmarks, we care about this 2008-10-11 13:40 average file size has grown over the years from about 8k to nearly 11k 2008-10-11 13:40 not growing very fast, really 2008-10-11 13:41 in the same period the total file data size has about tripled 2008-10-11 15:41 yet another amazing fact: average length of a filename in 2.6.26.5 kernel tree is 36 chars 2008-10-11 15:41 translates into 85 names per ext3 dirent block 2008-10-11 15:50 -!- ajonat_(~ajonat@190.48.116.113) has joined #tux3 2008-10-11 17:01 hey 2008-10-11 17:06 9080 / 1000. 2008-10-11 17:06 whoops 2008-10-11 17:06 wrong window ;) 2008-10-11 17:09 my calculations suggest we will be able to achieve 97.6% of raw disk write bandwidth for the case of untarring a kernel tree to an empty filesystem, complete with atomic commit, but provided the direct data pointer inode attribute is implemented, to get rid of the btree root + leaf per file 2008-10-11 17:10 with the current layout, we can get about 56% of raw 2008-10-11 17:14 Ext3 at present achieves 14% of raw bandwidth as measured on my system here 2008-10-11 17:14 I'm sure I'm missing some overheads that will pull that 97% number lower 2008-10-11 17:14 but we have an awful lot of room for error here 2008-10-11 17:15 put it another way: ext3 write performance sucks hard 2008-10-11 17:15 low bar to jump over 2008-10-11 19:11 what accounts for ext3's write performance ?issues ? 2008-10-11 19:24 journal for one thing 2008-10-11 19:26 what else ? 2008-10-11 19:27 is xfs faster ? 2008-10-11 19:29 don't know, why don't you run some tests? 2008-10-11 19:33 jjfjfjfjfjfjf 2008-10-11 19:33 shit 2008-10-11 19:33 sorry 2008-10-11 19:39 ok, a fairer test shows ext3 coming in at 25 MB/sec 2008-10-11 19:39 untar a tarfile from ramfs 2008-10-11 19:40 still a very far way from there to speed of the disk 2008-10-11 19:45 most disk top out at that rate 2008-10-11 19:45 unless you have a raid array or something like that 2008-10-11 19:46 -!- data`(~data@echo489.server4you.de) has joined #tux3 2008-10-11 19:47 not really 2008-10-11 19:47 dd will get about 64 MB/sec on this disk 2008-10-11 20:00 ok, just confirmed... dd from ramfs to my disk runs from 56 to 64 MB/sec 2008-10-11 20:01 same disk that's running my workstation and server right now ;) 2008-10-11 20:01 course, have to be quite careful not to destroy it while running the dd write test 2008-10-11 20:02 let's round that throughput to 60 MB/s 2008-10-11 20:03 Ext3 therefore hits about 42% of raw bandwidth 2008-10-11 20:04 we are aiming higher with tux3 2008-10-11 20:05 initially, just over 50% of raw would be nice, before optimizing to get rid of the two btree blocks per file for small files 2008-10-11 20:05 then I don't see why we can't hit 80-90% of raw after optimizing 2008-10-11 22:18 -!- pranith_(~bobby@122.162.70.206) has joined #tux3 2008-10-11 22:18 hey all 2008-10-11 22:29 hi 2008-10-11 23:11 flips, hey 2008-10-11 23:11 hi 2008-10-12 00:19 cricket rocks :) 2008-10-12 01:28 -!- pgquiles(~pgquiles@247.Red-83-41-112.dynamicIP.rima-tde.net) has joined #tux3 2008-10-12 04:53 -!- Bobby_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 04:53 hey all 2008-10-12 05:04 -!- pranith_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 05:09 -!- pranith_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 05:22 -!- pranith_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 05:35 -!- pranith_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 06:39 hehehe 2008-10-12 08:02 sry, wrong channel 2008-10-12 08:02 :) 2008-10-12 08:11 -!- pgquiles(~pgquiles@249.Red-79-155-127.staticIP.rima-tde.net) has joined #tux3 2008-10-12 08:12 flips, you have pretty cool coding skills.. am learning a lot by reading the code :) 2008-10-12 09:13 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-12 10:27 -!- pranith_(~Bobby@122.162.70.206) has joined #tux3 2008-10-12 12:14 -!- ajonat(~ajonat@190.48.106.99) has joined #tux3 2008-10-12 18:48 -!- less(~less@145.116.238.192) has joined #tux3 2008-10-12 20:07 -!- Kirantpatil(~kiran@122.167.201.131) has joined #tux3 2008-10-12 20:07 -!- Kirantpatil(~kiran@122.167.201.131) has left #tux3 2008-10-12 23:30 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-13 00:20 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-13 00:28 -!- kbingham(~kbingham@cvs.mpc-ogw.co.uk) has joined #tux3 2008-10-13 05:46 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-13 08:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-13 09:15 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 09:31 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 09:38 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 09:45 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 09:50 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 09:58 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:01 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-13 10:03 -!- tim_dimm_(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-13 10:06 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-13 10:10 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:15 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:19 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:25 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:30 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:31 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:36 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:38 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:51 -!- Bobby_(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:54 -!- pranihome(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 10:56 -!- pranihome(~Bobby@122.162.69.155) has joined #tux3 2008-10-13 11:20 helloo 2008-10-13 11:20 its been pretty quite here lately 2008-10-13 11:45 revising trees... complete, balanced, red-black... 2008-10-13 12:13 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-13 12:40 flips: your posting is at the top of popular messages on kernel trap (again :) 2008-10-13 12:42 :) 2008-10-13 12:42 hi nataliep 2008-10-13 12:42 hi dan :) 2008-10-13 12:43 the one where I predict we will get 99% of media speed presumably 2008-10-13 12:43 now... to make that happen 2008-10-13 13:25 folks 2008-10-13 13:27 flips: how's it going ? 2008-10-13 13:28 moving along 2008-10-13 14:04 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-13 14:04 -!- data`(~data@echo489.server4you.de) has joined #tux3 2008-10-13 14:04 -!- less(~less@145.116.238.192) has joined #tux3 2008-10-13 14:04 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-13 14:04 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-13 14:04 -!- pgquiles(~pgquiles@249.Red-79-155-127.staticIP.rima-tde.net) has joined #tux3 2008-10-13 14:04 -!- ceatinge(~ceatinge@72.232.13.50) has joined #tux3 2008-10-13 14:04 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-10-13 14:04 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-13 14:04 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-13 14:04 -!- zbrown(~rufius@208.64.37.45) has joined #tux3 2008-10-13 14:04 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-10-13 14:04 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-10-13 14:29 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-13 14:53 nearly sk8 oclock 2008-10-13 16:23 -!- mdakin(~chatzilla@79.97.85.155) has joined #tux3 2008-10-13 19:42 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-13 20:50 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-13 22:31 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-14 00:24 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-14 01:50 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-14 02:41 -!- pgquiles_(~pgquiles@249.Red-79-155-127.staticIP.rima-tde.net) has joined #tux3 2008-10-14 02:42 -!- ceatinge_(~ceatinge@veryclever.net) has joined #tux3 2008-10-14 02:54 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-14 02:54 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-14 06:31 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-14 06:31 -!- nataliep(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-10-14 09:36 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-14 12:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-14 15:44 -!- mokkpr01(~chatzilla@133-132.127-70.tampabay.res.rr.com) has joined #tux3 2008-10-14 18:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-14 18:57 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-14 19:23 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-14 19:45 ACTION is wondering in buffer.c... 2008-10-14 19:47 wandering? 2008-10-14 19:48 just saw an article in the news about McCain and YouTube and the DMCA... cute 2008-10-14 19:49 right, wandering :P 2008-10-14 19:50 damn... I press the wrong button and the whole chat window was cleared :| 2008-10-14 19:52 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-14 19:52 Hi 2008-10-14 19:56 hi ralucam 2008-10-14 19:57 T-3 minutes 2008-10-14 19:59 T-1 2008-10-14 20:00 ACTION is ready 2008-10-14 20:01 ok 2008-10-14 20:02 last time we went delving into the relationship between pages and buffers 2008-10-14 20:02 let's do some more of that 2008-10-14 20:02 let's look at sb_bread 2008-10-14 20:03 http://lxr.linux.no/linux+v2.6.26.6/include/linux/buffer_head.h#L278 ? 2008-10-14 20:03 2.6.27 indexed yet? 2008-10-14 20:03 http://lxr.linux.no/linux+v2.6.27/include/linux/buffer_head.h#L278 :D 2008-10-14 20:03 right 2008-10-14 20:03 you don't think sticking to 2.6.26 is worth it? 2008-10-14 20:03 I don't know if the search works though... 2008-10-14 20:04 2.6.26 is fine 2008-10-14 20:04 we don't need to get off on lockless page cache right now 2008-10-14 20:04 ok, bread is the classic bsd way of accessing buffer cache 2008-10-14 20:05 just one parameter, the buffer, and the block to read is in the buffer struct 2008-10-14 20:05 which I will just call buffer instead of buffer_head from now on 2008-10-14 20:05 the _head is entirely fluff, doesn't mean anything 2008-10-14 20:06 struct buffer traditionally also has a size 2008-10-14 20:06 that was a stupid idea 2008-10-14 20:06 and we have largely dropped that now 2008-10-14 20:07 instead, the size is taken from a field in the superblock, which is why we now have sb_bread, taking an sb, a physical block on the device referenced by the sb, and returning a buffer 2008-10-14 20:07 sb_bread(struct super_block *sb, sector_t block) 2008-10-14 20:07 seems to take a block number as a parameter... 2008-10-14 20:07 yes 2008-10-14 20:08 oh, misinterpreted your comment 2008-10-14 20:08 (to mean the sb had a field with the block number) 2008-10-14 20:08 and my comment re the orignal bread was wrong, does not take a buffer 2008-10-14 20:08 http://www.ipnom.com/FreeBSD-Man-Pages/bread.3.html 2008-10-14 20:08 bread(struct uufsd *disk, ufs2_daddr_t blockno, void *data, size_t size) 2008-10-14 20:09 let's go find the old linux one just for interest 2008-10-14 20:09 2.4? 2008-10-14 20:09 yes 2008-10-14 20:10 you sure sb_bread, doesn't just read the superblock from a specific block number? 2008-10-14 20:10 http://lxr.linux.no/linux-old+v2.4.31/fs/buffer.c#L1189 2008-10-14 20:10 yes, I'm sure 2008-10-14 20:10 oh, right 2008-10-14 20:11 missed the previous line ;-) someone posted a link to 278 instead of 277 ;-) 2008-10-14 20:11 struct buffer_head * bread(kdev_t dev, int block, int size) <- the legacy linux version 2008-10-14 20:11 I feel stupid... 2008-10-14 20:11 the freebsd version fell even more off the tracks 2008-10-14 20:11 ah, you just tied me for mistakes tonight ;) 2008-10-14 20:12 the trick of the pro is to make those mistakes faster than the amateur 2008-10-14 20:12 Maze: sorry :P 2008-10-14 20:12 lol 2008-10-14 20:13 ok, bad to sb_bread 2008-10-14 20:13 simply calls __bread 2008-10-14 20:13 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1437 2008-10-14 20:13 which no longer needs to know anything about the sb 2008-10-14 20:14 the only reason we needed the sb was to know the blocksize and the underlying device 2008-10-14 20:14 this should have simpley been called "bread" 2008-10-14 20:14 that is, the sb_bread should have been bread 2008-10-14 20:14 I'm guess there's some weird interactions if you call this functions with non-constant size values 2008-10-14 20:15 don't do it 2008-10-14 20:15 never has worked properly 2008-10-14 20:15 never will 2008-10-14 20:15 putting the blocksize in the struct buffer was just a big mistake 2008-10-14 20:15 so size is basically a device property then? 2008-10-14 20:15 not really 2008-10-14 20:15 has been at times 2008-10-14 20:16 has caused lots of bugs 2008-10-14 20:16 see set_block_size 2008-10-14 20:16 or some name like that 2008-10-14 20:16 again, doesn't work properly 2008-10-14 20:16 the buffer size is properly just a property of the superblock, and actually one you can ignore 2008-10-14 20:16 as long as you don't overlap buffers of different sizes 2008-10-14 20:17 there is no cache coherence in that case 2008-10-14 20:17 meta-question: if we want to use bio's for everything... why do we care about bufferheads? 2008-10-14 20:17 we're going to arrive at a bio pretty soon in this little side trip 2008-10-14 20:17 let's try __bread_slow 2008-10-14 20:17 this code path has gotten deeply messed lately 2008-10-14 20:18 with various optimizations + historical cruft 2008-10-14 20:18 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1239 2008-10-14 20:18 we see submit_bh there 2008-10-14 20:18 let's go in 2008-10-14 20:18 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2862 2008-10-14 20:19 nothing terribly surprising 2008-10-14 20:19 and there we see some code much like you wrote for junkfs 2008-10-14 20:19 this is actually kind of a stupid arrangement 2008-10-14 20:19 the bio could have been allocated on the stack of the caller 2008-10-14 20:19 because we do a sync wait in __bread_slow 2008-10-14 20:20 ok, that is it for sb_bread 2008-10-14 20:20 anything not completely clear there? 2008-10-14 20:20 error handling ;-) 2008-10-14 20:21 hah 2008-10-14 20:21 very poor on this path 2008-10-14 20:21 these functions historically had no error report except "return NULL" 2008-10-14 20:21 much of Linux is still that way, very slowly changing 2008-10-14 20:22 if you check out my latest revs to the tux3 bio interface, there is a mechanism for sending back accurate errors there 2008-10-14 20:22 but usually we tend to drop the ball somewhere in the call chain, and not return the actual error, the higher level just guesses 2008-10-14 20:22 usually the guess is EIO or ENOMEM, randomly 2008-10-14 20:22 yeah, submit_bh returns a value, but it doesn't get checked, etc 2008-10-14 20:22 "don't be part of the problem" 2008-10-14 20:23 when you write you own kernel code 2008-10-14 20:23 you will even see stuff like that in my user space simulation 2008-10-14 20:23 C is just not very good at returning error codes 2008-10-14 20:23 anyway... I will fix it over time 2008-10-14 20:23 -!- cydork(~vihang@59.184.62.147) has joined #tux3 2008-10-14 20:23 it's the penalty you pay for having full control of exceptions... 2008-10-14 20:24 for not having? 2008-10-14 20:24 oh 2008-10-14 20:24 kind of 2008-10-14 20:25 it's more about having no good way to return multiple results from a function, one of which is an error code 2008-10-14 20:25 there's IS_ERR 2008-10-14 20:25 error used it? 2008-10-14 20:25 painful 2008-10-14 20:25 beautifull hack if there ever was one... 2008-10-14 20:25 semantics are not completely obvious either 2008-10-14 20:25 but yeah, it's a little painful 2008-10-14 20:26 often not clear whether it wants err or -err 2008-10-14 20:26 I think part of the problem is it doesn't consider null an error 2008-10-14 20:26 IS_ERR? it wants a pointer 2008-10-14 20:26 and returns bool 2008-10-14 20:26 ERR_PTR 2008-10-14 20:26 I think of it as all one thing 2008-10-14 20:26 clumsy 2008-10-14 20:26 but what can you do? 2008-10-14 20:27 everybody seen vecio and syncio from tux3/super.c ? 2008-10-14 20:28 ACTION did not :( 2008-10-14 20:28 linky? 2008-10-14 20:28 http://phunq.net/ddtree?p=tux3fs;a=blob;f=fs/tux3/super.c;h=1023f06407bc8752e0afc4c2c71940023a18b9f9;hb=HEAD 2008-10-14 20:29 I have to set this repo up better 2008-10-14 20:29 junkfs_fill_super - dead code? 2008-10-14 20:29 it does the only actual work 2008-10-14 20:30 oh lol 2008-10-14 20:30 called from ext3_fill_super 2008-10-14 20:30 yeah see it now 2008-10-14 20:30 tux3_... 2008-10-14 20:30 will disappear next rev, yes it was humor 2008-10-14 20:30 anyway, there you see a far more elegant way of getting a block into memory than sb_bread 2008-10-14 20:31 we only need to know about sb_bread to know how other filesystems do it 2008-10-14 20:31 well 2008-10-14 20:31 sb_bread is still important to is 2008-10-14 20:31 let's go take a look at another part of it 2008-10-14 20:32 where it enters the buffer into the buffer cache 2008-10-14 20:32 almost forgot about that, the most important thing 2008-10-14 20:32 __getblk actually creates the buffer and does this job 2008-10-14 20:33 just like in the tux3 buffer cache emulation 2008-10-14 20:33 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1403 2008-10-14 20:33 thanks 2008-10-14 20:33 you got there seconds ahead of me ;) 2008-10-14 20:34 what, I'm still unsure off... 2008-10-14 20:34 also: http://tux3.org/tux3?f=a4a6f8e640c5;file=user/test/buffer.c <- search for "bread" 2008-10-14 20:34 is why we need a buffer cache 2008-10-14 20:34 don't we have a page cache already? 2008-10-14 20:34 short answer: filesystem metadata 2008-10-14 20:35 there is a page cache dedicated to the block device itself, in addition to a page cache for each inode 2008-10-14 20:35 but when block size does not match page size, it is pretty much impossible to do locking properly with page sized units 2008-10-14 20:36 so what we do instead, is use the buffer attached to the pages as our locking units 2008-10-14 20:36 we looked at that last thursday 2008-10-14 20:36 I thought we'd already put the metadata in files... ;-) 2008-10-14 20:36 but at the time did not really know what it was for 2008-10-14 20:36 heh 2008-10-14 20:36 well what about the metadata for the metada files? 2008-10-14 20:37 ultimately we have to go cache some absolute blocks 2008-10-14 20:37 I thought that was in RAM ;-) 2008-10-14 20:37 uhm the superblock? 2008-10-14 20:37 but let's not divert the topic too much 2008-10-14 20:38 and the blocks that index the files 2008-10-14 20:38 they can't themselves be in files, unless you want a really evil result like ntfs 2008-10-14 20:38 and even then, you can't put _all_ of the file index blocks in files 2008-10-14 20:39 ok, __getblk 2008-10-14 20:39 there's a "friend of grab_cache_page" 2008-10-14 20:39 __find_get_block 2008-10-14 20:40 just a wrapper for __find_get_block_slow 2008-10-14 20:40 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1118 2008-10-14 20:40 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1370 2008-10-14 20:40 see the touch_buffer() there? that implements the lru 2008-10-14 20:41 brings the underlying page to the hot end of the lru 2008-10-14 20:41 we should be here now http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L262 2008-10-14 20:42 now we found a real freind of grab_cache_page, as opposed to a mere hanger on 2008-10-14 20:42 find_get_page 2008-10-14 20:42 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L630 2008-10-14 20:43 we don't need to go there just now, suffice to say that if the page isn't in the page cache it doesn't try to add it 2008-10-14 20:44 back at _slow... 2008-10-14 20:45 if we find a page in the page cache and it has buffers, when we loop across the buffer list mod the ratio of the buffer size to the page size 2008-10-14 20:46 ah, and we do some evil cruft with the buffer_mapped concept 2008-10-14 20:46 buffer_mapped meaning that the _blocknr field in the buffer is filled in with a physical block number 2008-10-14 20:46 and a bit is set in the buffer flags to indicate this is so 2008-10-14 20:47 actually, that field is entirely redudant in the case of the buffer cache 2008-10-14 20:47 because we can always know the physical device offset from the page->index of the underlying page that stores the buffer data 2008-10-14 20:48 this code returns with a pointer to buffer in "ret" 2008-10-14 20:48 crufty stuff 2008-10-14 20:49 ok, that was the fast path 2008-10-14 20:49 if the buffer wasn't there then we fall onto the slow path 2008-10-14 20:49 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1403 2008-10-14 20:49 in __getblk 2008-10-14 20:50 see that hardsector size stuff 2008-10-14 20:50 largely legacy 2008-10-14 20:50 doesn't do much except create bugs these days 2008-10-14 20:51 lol 2008-10-14 20:51 in __getblock_slow we see an attempt at integration with the vm cache shrinking code 2008-10-14 20:51 it's not pretty 2008-10-14 20:52 but sometime go look at grow_buffers 2008-10-14 20:52 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1118 ? 2008-10-14 20:52 yes 2008-10-14 20:53 "__getblk() cannot fail - it just keeps trying." http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1396 2008-10-14 20:54 in spite of this assertion, the kernel is littered with code to take evasive action if getbllk returns NULL 2008-10-14 20:54 lol 2008-10-14 20:55 should just let that segfault in the tux3 kernel port, perhaps with a pointer to the comment 2008-10-14 20:55 1400 * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers() 2008-10-14 20:55 1401 * attempt is failing. FIXME, perhaps? 2008-10-14 20:55 wheee 2008-10-14 20:55 try_to_free_buffers is the worst function in the entire kernel 2008-10-14 20:55 whee indeed 2008-10-14 20:56 it's vm, not vfs so we will not look at it right now 2008-10-14 20:56 all this buffer stuff is very fragile and arguably broken 2008-10-14 20:56 it's a credit to the bug chasing talents of people like akpm and linus that it works at all 2008-10-14 20:57 just to give everybody some sense of confidence in what we are about to do ;) 2008-10-14 20:57 you didn't come to university to have the truth softened, right? 2008-10-14 20:57 :P 2008-10-14 20:57 I'm trying to understand why we have per-file and per-metadata pagecache + bufferheads, instead of just device cache 2008-10-14 20:58 this is just device cache 2008-10-14 20:58 ah 2008-10-14 20:58 you mean why not throw away file caches? 2008-10-14 20:58 basically 2008-10-14 20:58 because we need to index cache objects by logical file offset 2008-10-14 20:58 or have filecaches just be pointers to the right pages of the device cache 2008-10-14 20:58 good idea 2008-10-14 20:59 probably a very good idea 2008-10-14 20:59 we' kind of in transition here 2008-10-14 20:59 unifying the page and buffer cache, which used to be a lot more separate 2008-10-14 21:00 linux 2.0 actually copyied data between them to get something resembling coherence 2008-10-14 21:00 right, but right now we have a lot of copying of data around, right? 2008-10-14 21:00 so it's better than it was, which was really really awful 2008-10-14 21:00 we don't, no 2008-10-14 21:00 it's pretty much all done with pointers 2008-10-14 21:00 if you have disk - partition - lvm physical - lvm volume - lvm logical - filesystem - file 2008-10-14 21:00 then how many times to we copy 4KB of data in order to read 4KB? 2008-10-14 21:00 still all just pointers 2008-10-14 21:00 none usually 2008-10-14 21:01 sorry 2008-10-14 21:01 one 2008-10-14 21:01 copy_to_user 2008-10-14 21:01 one dma into the cache and one copy_to_user 2008-10-14 21:01 if it's memory mapped, then no copy_to_user 2008-10-14 21:01 just the dma 2008-10-14 21:03 right, because there's no device cache 2008-10-14 21:03 so unless there's some raid involved, it doesn't much matter 2008-10-14 21:04 it would be very nice to use the buffer cache as a device cache 2008-10-14 21:04 and we should 2008-10-14 21:04 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-14 21:04 so, if instead of reading from userspace, we mmap, will that be just a dma to the mmapped region? 2008-10-14 21:04 but it is a lot of work, and as you can see, there is some very fragile code that _will_ break when we mess with this 2008-10-14 21:04 that will 2008-10-14 21:04 dma to the physical page, which is mapped into a process memory space 2008-10-14 21:05 will the dma happen on mmap, or when we later try to read a not-present page? 2008-10-14 21:05 I'm guessing the latter 2008-10-14 21:06 make sense to be the latter! 2008-10-14 21:06 you can make a big areas :P 2008-10-14 21:06 ok, just to wrap up our tour of getblk, the place where a buffer is actually created and inserted into the buffer cache is grow_buffers 2008-10-14 21:06 folks 2008-10-14 21:06 rather badly misnamed function, which is why I had to look at it half a dozen times to know where this happens 2008-10-14 21:06 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1081 2008-10-14 21:07 (interestingly a second implementation in raid5) 2008-10-14 21:07 grow_dev_page <- trying to continue the grand tradition of finding ever worse names for functions 2008-10-14 21:07 no wonder bsd guys tend to slit their wrists when forced to read linux code 2008-10-14 21:08 probably explains why there are so few bsd guys 2008-10-14 21:08 :-) 2008-10-14 21:08 yes, we all know linux leads to killer filesystems 2008-10-14 21:08 eek 2008-10-14 21:09 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1093 <- compare to #1108 2008-10-14 21:09 and promise me you will never write code like that 2008-10-14 21:10 I believe that does actually do what it's supposed to 2008-10-14 21:10 :D 2008-10-14 21:10 could use a comment though 2008-10-14 21:10 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1028 <- finally, here is where the work gets done 2008-10-14 21:10 we enter a page into the page cache, with buffers on it 2008-10-14 21:10 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1027 2008-10-14 21:11 I think this code actually came from me 2008-10-14 21:11 way back 2008-10-14 21:11 started as my hack to make htree work in the page cache of a file 2008-10-14 21:11 we seem to have a lot of ways to allocate memory... 2008-10-14 21:11 linus liked that idea and decided to use it for the buffer cache 2008-10-14 21:11 there doesn't seem to be much deallocation 2008-10-14 21:11 a good idea 2008-10-14 21:11 maze, there only needs to be deallocation in one spot 2008-10-14 21:12 shrink_caches 2008-10-14 21:12 that is indeed a magical and good thing 2008-10-14 21:12 yes 2008-10-14 21:12 are all page caches actually in one lru then? 2008-10-14 21:12 the kernel is this kind of organica, self cleaning thing 2008-10-14 21:12 has to keep moving, like a shark 2008-10-14 21:13 filling up cache with new stuff about to be used, evicting old stuff to make room for it 2008-10-14 21:13 (wouldn't that lru then be a source of lock contention on multi-way smp?) 2008-10-14 21:13 yes, all pages in the system are in one lru 2008-10-14 21:13 this actually doesn't make complete sense 2008-10-14 21:13 since dirty pages these days do not tend to be evicted via the page lru at all 2008-10-14 21:13 but by inode flushes 2008-10-14 21:14 because a filesystem can't afford to have the vm writing out random pages in orders that violate ACID constraints 2008-10-14 21:15 ok we went into bonus time 2008-10-14 21:15 questions on thursday ;) 2008-10-14 21:15 :-) 2008-10-14 21:16 ;-) 2008-10-14 21:17 how'd we do on the interesting front this time? 2008-10-14 21:17 it's complex... 2008-10-14 21:17 I feel like vast pieces of this should be avoided in new fs code 2008-10-14 21:17 we're going to wallow in it, unfortunately 2008-10-14 21:18 because using anything other than buffers to access your metadata blocks leads to worse horrors 2008-10-14 21:18 ie. there should be no references to buffer heads at all 2008-10-14 21:18 nice idea except when your block size is smaller than a page 2008-10-14 21:18 see, the concept of 'metadata blocks' 2008-10-14 21:18 is something I have issue with ;-) 2008-10-14 21:18 fixing that problem will lead to reinventing the buffer cache 2008-10-14 21:19 I anxiously await your proposal to replace the notion of metadata 2008-10-14 21:19 not metadata... just metadata blocks 2008-10-14 21:19 "hyperdata" 2008-10-14 21:19 I see 2008-10-14 21:19 with? 2008-10-14 21:19 metadata extents? 2008-10-14 21:19 which would be... simpler? 2008-10-14 21:19 buy not having metadata blocks, you should be able to get acid with no effort (or very little additional effort) 2008-10-14 21:20 buy -> by 2008-10-14 21:20 what do you use instead of metadata blocks? 2008-10-14 21:21 hmm, that's hard to describe a not-fully thought out idea 2008-10-14 21:21 but basically a forward log 2008-10-14 21:21 what about the cache? 2008-10-14 21:21 combined with always writing to free disk space 2008-10-14 21:21 cache of what? 2008-10-14 21:21 metadata 2008-10-14 21:21 in memory structure doesn't have to have anything in common with on-disk 2008-10-14 21:22 probably some sort of tree in sparse file or something though 2008-10-14 21:22 it's very helpful if it does 2008-10-14 21:22 the buffer cache already is a tree 2008-10-14 21:22 and if you have a sparse file, you have to have metadata for that file somewere 2008-10-14 21:22 in the tree of course ;-) 2008-10-14 21:22 where, in another file? and now does that recursion terminate? 2008-10-14 21:23 that's why you have a forward log 2008-10-14 21:23 with care ;-) 2008-10-14 21:23 but you still haven't explained how your metadata is cached 2008-10-14 21:23 the tree is in a file, the file is page cached 2008-10-14 21:23 and how are the blocks of that page cache mapped to the disk? 2008-10-14 21:24 using the tree 2008-10-14 21:24 which tree? 2008-10-14 21:24 the one stored on those blocks 2008-10-14 21:24 what if you have a cache miss on one of those blocks? 2008-10-14 21:24 yeah, it's hard to describe 2008-10-14 21:25 I should probably work it out fully... 2008-10-14 21:25 yes, and you will realized that we're not that far off with the current arrangement 2008-10-14 21:25 (a cache miss is not a big problem with a tree, so long as it's not a mere radix-tree) 2008-10-14 21:25 the part that sucks is being stuck with page size resolution, we need more flexibility than that 2008-10-14 21:26 which is what this whole creaky mess of buffer_heads is about 2008-10-14 21:26 it's a solution, just not a good solution 2008-10-14 21:26 improving it would be a good project 2008-10-14 21:26 not a summer project though 2008-10-14 21:26 right 2008-10-14 21:27 I can understand why it's done the way it is 2008-10-14 21:27 it's just duplication of code/concepts and multiple opportunities to screw up and get locking wrong in edge cases 2008-10-14 21:27 a couple of things I propose to do about it 2008-10-14 21:27 1) let struct page denote objects with sizes larger and smaller than page size, thus obviating the need for struct buffer_head 2008-10-14 21:28 larger... probably easy 2008-10-14 21:28 smaller... ur 2008-10-14 21:28 brain fault 2008-10-14 21:28 2) unify the page and buffer cache so that a miss in a page cache then looks in the buffer cache to see if the page is there, so we use the buffer cache as a large device cache 2008-10-14 21:29 3) implement physical readahead in the unified cache 2008-10-14 21:29 4) implement active page table defragmentation so that we can realistically work with larger block sizes 2008-10-14 21:29 5) dynamically allocate struct page's ;-) 2008-10-14 21:30 that's part of (1) 2008-10-14 21:30 yeah, I was wondering if you meant to include that or not 2008-10-14 21:31 a crude form 2008-10-14 21:31 only dynamically allocate to fill in the gaps between the ones in the array 2008-10-14 21:31 gaps 2008-10-14 21:31 ? 2008-10-14 21:32 I think if you want to do dynamic allocation of struct page it's an all-or-nothing scenario 2008-10-14 21:32 yes, you have an array of 4k physical pages, but want to have 1K struct pages, so 3 1K struct pages go between each two 4K physical pages 2008-10-14 21:32 not at all 2008-10-14 21:32 currently physical page address -> struct page is a (PA>>PAGE_SIZE)*sizeof(struct page) + base operation 2008-10-14 21:32 just dynamically allocating for the sub-physical sized pages works out ok 2008-10-14 21:33 you could get much more invasive about this, but probably not a good idea for a first try 2008-10-14 21:33 ACTION says good night (and thanks for the lecture) 2008-10-14 21:33 by virtue of what a 'page' is for the cpu, I'm not sure sub-pagesize pages are realistic 2008-10-14 21:33 night raz 2008-10-14 21:33 good night 2008-10-14 21:33 what happens when you have conflicting access permissions on two sub-pages? 2008-10-14 21:34 the sub-pagesize pages are basically just for locking 2008-10-14 21:34 which is the only really indispensible thing that buffer_heads do at present 2008-10-14 21:34 don't conflict 2008-10-14 21:34 subpages are not entered into page table entries 2008-10-14 21:34 they can't be 2008-10-14 21:34 in that case couldn't we just have a byte of 8 bits for 8 512 byte locks in struct page? 2008-10-14 21:35 possibly, but there page oriented code uses more fields than that 2008-10-14 21:35 for example, the ->index 2008-10-14 21:35 used to locate the apge in a page cache 2008-10-14 21:35 we want all that code to continue to work 2008-10-14 21:36 otherwise we have a massive rewrite in store for everything that touches a page 2008-10-14 21:37 a change like this... 2008-10-14 21:37 it would probably end up with a massive rewrite almost any decent way you do it 2008-10-14 21:38 I don't think it'd be possible to have some sort of shim compatibility translation layer 2008-10-14 21:38 you could potentially leave buffer_heads around, until everthing had been ported... 2008-10-14 21:39 but actually have the same interface... unlikely 2008-10-14 21:44 the subpage concept isn't that big a deal 2008-10-14 21:45 mostly just affects things like grab_cache_page that we looked at 2008-10-14 21:45 a whole bunch of block io library cruf goes away 2008-10-14 21:45 because we lose the list of buffers per page 2008-10-14 22:05 http://www.newmobilecomputing.com/thread?333779 2008-10-14 22:06 where? 2008-10-14 22:06 ?where? 2008-10-14 22:07 oh 2008-10-14 22:07 mention of ftux3 2008-10-14 22:07 trying to locate the location 2008-10-14 22:07 of the summit 2008-10-14 22:07 what summit? 2008-10-14 22:07 oh right yeah 2008-10-14 22:07 the one in the article 2008-10-14 22:07 was a pretty lame summit 2008-10-14 22:07 flips: you get google alerts for tux3 also i see ;) 2008-10-14 22:07 oh that summit 2008-10-14 22:07 even worse 2008-10-14 22:08 shapor, they're great 2008-10-14 22:08 NYC 2008-10-14 22:09 http://www.linux.com/feature/132203 <- joe barr's take 2008-10-14 22:09 all in all, a very bad thing 2008-10-14 22:09 for linux 2008-10-14 22:09 getting way too much corp polictics in the works 2008-10-14 22:10 linux foundation... not really representing the community 2008-10-14 22:10 http://www.austinlug.org/node/259 2008-10-14 22:12 "Just out of respect for the natives of Austin, they should have made a choice not to slouch back on bureaucratic policy and instead, make an exception to that policy in order to be good guests and pay respect to the local Linux Kernel enthusiasts. 2008-10-14 22:12 Instead, they big-timed him and sent him home. That's when you know your movement has been co-opted and it's no longer a progressive social force." 2008-10-14 22:16 http://blog.internetnews.com/skerner/2008/10/no-press-at-linux-foundation-e.html 2008-10-14 22:51 hey all 2008-10-14 22:51 hi 2008-10-14 22:52 hello flips 2008-10-14 23:06 say goodnight all 2008-10-14 23:06 hey pranith 2008-10-14 23:06 hey tim_dimm 2008-10-14 23:06 past my bedtime 2008-10-14 23:07 off to sleep huh? 2008-10-14 23:07 yup 2008-10-14 23:07 hmm 2008-10-14 23:07 twins wore me out today 2008-10-14 23:07 goodnight then 2008-10-14 23:07 :) 2008-10-14 23:07 :-) 2008-10-14 23:07 hmm, lucky you 2008-10-14 23:07 boy and a girl 2008-10-14 23:07 very lucky 2008-10-14 23:07 yeah, i remember :) 2008-10-14 23:07 later guys 2008-10-14 23:46 -!- cydork(~cydoork@122.169.100.164) has joined #tux3 2008-10-14 23:48 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-15 02:07 the eee is back 2008-10-15 02:07 turns out what you have to do is spam the F9 key to get to the grub menu 2008-10-15 02:07 not what it says on the boot screen 2008-10-15 02:08 also there is advice out there for me that the little fan is so pathetic you can just disconnect it without affecting the operating temperature 2008-10-15 02:08 that is good news to be because the only thing I really hate about this little guy is the fan noise 2008-10-15 02:25 flips: wasn't the eeepc supposed to be ultra-silent? 2008-10-15 02:26 no 2008-10-15 02:26 not one of the advertised features 2008-10-15 02:26 but it is if you disconnect the fan 2008-10-15 02:27 which implies underclocking it, I assume 2008-10-15 02:27 http://www.newegg.com/Product/Product.aspx?Item=N82E16883220004 2008-10-15 02:27 "It is quieter than a whisper" 2008-10-15 02:27 on the other hand, silentpcreview measured the eee box pc at 22 db 2008-10-15 02:27 whereas asus claims 26 db, which is already extremely quiet 2008-10-15 02:27 901 maybe 2008-10-15 02:28 mine is a 900, has a celeron-m 2008-10-15 02:28 flips: btw, how come you were not in the linux foundation summit? btrfs and ext4 were presented 2008-10-15 02:28 next year when there is working code 2008-10-15 02:28 actually, summit doesn't do anything of use 2008-10-15 02:29 we've already demonstrated we can get just as much pr from grassroots as you can by summiting 2008-10-15 02:29 with internet, conferences are mostly a place to meet people and see friends you seldomly see :-) 2008-10-15 02:29 right, it's mainly about the drunk 2008-10-15 02:29 :-D 2008-10-15 02:29 done plenty of those things in the past 2008-10-15 02:30 I'm actually more interested in outside events now 2008-10-15 02:30 doing linux conferences gets to be a lot like preaching to the choir 2008-10-15 02:31 indeed 2008-10-15 02:31 speaking of pr, there should be another post by tomorrow 2008-10-15 02:31 on design details of atomic commit 2008-10-15 02:31 code will follow not too long after 2008-10-15 02:32 you gotta write that book on implementation of filesystems in linux :-) 2008-10-15 02:32 maybe just collect all the posts and put a cover on them? 2008-10-15 02:32 heh, some cleaning and editing would be needed :-) 2008-10-15 02:33 for instance, put the code in the appropiate place among the text 2008-10-15 02:33 :-) 2008-10-15 06:05 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-15 08:05 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-15 09:08 -!- mingming(~mingming@bi01p1.co.us.ibm.com) has joined #tux3 2008-10-15 09:09 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-15 10:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-15 11:04 -!- mingming(~mingming@bi01p1.co.us.ibm.com) has joined #tux3 2008-10-15 11:05 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-15 13:05 -!- natalie(~nataliep@207.47.98.129.static.nextweb.net) has joined #tux3 2008-10-15 13:07 hi mingming 2008-10-15 14:41 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-15 15:22 folks 2008-10-15 15:50 woohoo, I now have a fanless eee 900 2008-10-15 15:50 http://wiki.eeeuser.com/howto:disconnect_fan?s=turn%20off%20fan 2008-10-15 15:50 ACTION <- modder 2008-10-15 15:50 minor mod 2008-10-15 15:51 nothing melting so far 2008-10-15 15:52 there are officially no moving parts on this eee now, unless you count the keyboard 2008-10-15 16:39 sk8 oclock 2008-10-15 17:52 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-15 18:11 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-15 19:50 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-15 20:23 -!- nataliep(~nataliep@72.14.224.1) has joined #tux3 2008-10-15 20:39 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-15 22:17 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-15 23:51 flips: did you see this? http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs 2008-10-15 23:52 fuse? 2008-10-15 23:54 oh, solaris 2008-10-15 23:54 hey, I have a great idea 2008-10-15 23:54 why don't we give Gnome to sun to use exclusively on Solaris? 2008-10-15 23:54 kill two birds with one stone 2008-10-15 23:54 :-D 2008-10-15 23:54 you are bad, very bad 2008-10-15 23:55 sometimes I don't try to hide it 2008-10-15 23:55 some guy implemented something like that for KDE3 a few years ago for NILFS 2008-10-15 23:55 but it was never committed to KDE's svn repository 2008-10-15 23:56 previous versions works with ddsnap 2008-10-15 23:56 I forget what the gui front end was 2008-10-15 23:56 so... what about a time travel interface for kde, and what to test it with? 2008-10-15 23:57 it'd be nice 2008-10-15 23:57 2008-10-15 23:58 by the way, I wonder where people got the idea that infinite snapshots come for free 2008-10-15 23:58 there is this thing called churn 2008-10-15 23:59 and if you've got it, kiss your time travel goodbye 2008-10-15 23:59 http://www.sandeepranade.com/html/ComputerScience/time-travelling-file-manager.html 2008-10-15 23:59 it will be back to good ol hourly/daily/weekly and precious few of the latter 2008-10-15 23:59 oh, dead 2008-10-16 00:00 wayback? 2008-10-16 00:01 http://web.archive.org/web/20080128190254/http://www.sandeepranade.com/html/ComputerScience/time-travelling-file-manager.html 2008-10-16 00:01 yeah 2008-10-16 00:01 ext3cow, the obvious tool to test it with 2008-10-16 00:02 that was it, not NILFS as I said 2008-10-16 00:02 and I wonder when the ext3cow guys are going to go for kernel merge 2008-10-16 00:02 when they are able to rm again? :-) 2008-10-16 00:02 they can rm, it just doesn't go away 2008-10-16 00:03 btw, a few months ago I briefly looked into adding previous versions support to DolphinPart (which is what KDE uses these days, both in Dolphin and in Konqueror) and it did not look too hard 2008-10-16 00:03 should be integrated with a generic time shifter 2008-10-16 00:03 I like the zoom buttons on the link above 2008-10-16 00:04 the other thing needed on such a slider is little marks where significant changes actually happened 2008-10-16 00:04 I don't know how you'd do that 2008-10-16 00:04 anything at all, to show where the activity was 2008-10-16 00:05 a little squashed histogram of activity instead of linear time scale, maybe 2008-10-16 00:06 adding little marks where significant changes actually happened should not be difficult, but how to find significant changes? a change which affects many files (and how many is "many"?) ? a change which affects a large amount of data (is deleting a DVD-ripped movie actually a big change?) ? 2008-10-16 00:07 exactly 2008-10-16 00:07 well there we have an advantage in tux3, you see all the changes at the same time, at least for one file 2008-10-16 00:08 some kind of out of band activity report 2008-10-16 00:08 we need that actually, for the new generation of filesystems 2008-10-16 00:08 like git and hg have 2008-10-16 00:09 do git and hg have heuristics to tell you "hey, this was a HUGE change"? I didn't know 2008-10-16 00:09 they don't report that 2008-10-16 00:10 it's interesting stuff, we should be mining that data out of our filesystems somehow 2008-10-16 00:10 something to think about 2008-10-16 00:17 ACTION turns in early 2008-10-16 00:17 got to get back in the coding saddle tomorrow, plus post a new post 2008-10-16 00:20 see you! 2008-10-16 00:20 pgquiles_, I hope somebody picks up that time travel interface work again, most practically with ext3cow I think 2008-10-16 00:20 I hope I have time to do that 2008-10-16 00:20 :-) 2008-10-16 00:20 gnight 2008-10-16 01:56 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-16 02:09 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-16 07:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-16 08:29 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-16 09:45 -!- Bobby_(~Bobby@122.162.71.144) has joined #tux3 2008-10-16 12:31 -!- bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-10-16 13:28 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-16 15:00 need a catchy subject line for the follow up post to thinking about syncing 2008-10-16 15:34 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-16 15:58 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-16 16:52 title for the next post is set 2008-10-16 16:52 "Of Phases, Quanta, and Episodes" 2008-10-16 16:54 nearly sk8 oclock 2008-10-16 16:57 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-16 16:57 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: page locking and IO life cycle" 2008-10-16 16:58 -!- ChanServ changed mode/#tux3 -> -o flips 2008-10-16 16:58 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-16 16:59 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: IO life cycle of a page" 2008-10-16 16:59 -!- ChanServ changed mode/#tux3 -> -o flips 2008-10-16 16:59 scans better 2008-10-16 18:00 -!- tux3bot(~tux3bot@yzf.shapor.com) has joined #tux3 2008-10-16 18:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-16 19:45 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-16 19:48 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-16 19:48 Hi! 2008-10-16 19:52 hi 2008-10-16 19:52 ACTION has 8 minutes to shower after the sk8 2008-10-16 19:53 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-16 19:53 hi 2008-10-16 19:57 lhi raluca 2008-10-16 19:59 2 days into life with its fan disconnected and the eee is happy and healthy 2008-10-16 19:59 recommended imho 2008-10-16 19:59 :-) 2008-10-16 19:59 what are you using it for? 2008-10-16 19:59 now I just need to do the spacebar mod and it will be a fine machine 2008-10-16 19:59 use it for a laptop 2008-10-16 19:59 much nicer to carry around than the thinkpad 2008-10-16 20:00 doesn't bend your shoulder 2008-10-16 20:00 :-) 2008-10-16 20:00 in fact, fits in the _flap_ of my camera backpack, and it isn't a big backpack 2008-10-16 20:00 mine is ;-) 2008-10-16 20:00 what kind of camera in it? 2008-10-16 20:00 whoops 2008-10-16 20:00 usually the canon 10d 2008-10-16 20:01 I can't know that, it's tux3 u kind 2008-10-16 20:01 ah 2008-10-16 20:01 20d here 2008-10-16 20:01 old but awesome camera :D 2008-10-16 20:01 oh yes 2008-10-16 20:01 even better :P 2008-10-16 20:01 40d will replace it as soon as I get clearance from my wife 2008-10-16 20:01 :D 2008-10-16 20:01 this is entirely justified by the 11000 baby pictures I took in the last two years 2008-10-16 20:01 ACTION is ready for tux3 2008-10-16 20:02 :-) :-) :-) 2008-10-16 20:02 ok, let's go 2008-10-16 20:02 today will be nittier and grittier 2008-10-16 20:02 let's go look at block_read_full_page 2008-10-16 20:03 ACTION listens to the sound of browsers revving up 2008-10-16 20:03 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2086 2008-10-16 20:04 beat me ;) 2008-10-16 20:04 on entry to this function the page must be locked 2008-10-16 20:05 or this task will oops with a BUG 2008-10-16 20:05 so where does the page get unlocked? 2008-10-16 20:05 see if you can find it, max 3 minutes 2008-10-16 20:06 ACTION takes the chance to snag a glass of vino 2008-10-16 20:07 ACTION is searching 2008-10-16 20:08 talk about your reasoning as you go if you like 2008-10-16 20:09 ok, hint: we've seen the mechanism before 2008-10-16 20:09 right now I'm searching for unlock_page usage 2008-10-16 20:09 I looked for block_read_full_page first 2008-10-16 20:09 didn't get to far 2008-10-16 20:09 when _should_ the page taht is being read be unlocked 2008-10-16 20:09 ? 2008-10-16 20:10 I mean, if you were designing your own os 2008-10-16 20:10 when will be evicted 2008-10-16 20:10 sorry 2008-10-16 20:10 when is not in use anymore 2008-10-16 20:10 apology accepted ;) 2008-10-16 20:11 when it has been successfully read 2008-10-16 20:11 that is when it should be unlocked 2008-10-16 20:11 aaa... 2008-10-16 20:11 I was very wrong :P 2008-10-16 20:11 then all tasks blocked trying to get the page lock will be unlocked, and can then read the data on the page 2008-10-16 20:11 sure, but you thought about it 2008-10-16 20:11 that's the important step 2008-10-16 20:11 ok, so the locking is only for waiting, right? 2008-10-16 20:12 locking is always only for waiting 2008-10-16 20:12 true 2008-10-16 20:12 locking enforces synchronized access to data by making tasks wait 2008-10-16 20:13 in this case, tasks must wait because the page does not contain valid data yet, the data has to be read from disk 2008-10-16 20:13 ack 2008-10-16 20:13 the very first task that needs the data will grab the lock and then be responsible for launching the read 2008-10-16 20:13 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2154 2008-10-16 20:14 that's one unlock location 2008-10-16 20:14 not the common one 2008-10-16 20:14 so it makes perfect sense to check if the lock is on 2008-10-16 20:14 if there is none then nobody is waiting 2008-10-16 20:14 what about read ahead? :P 2008-10-16 20:15 whole nuther topic 2008-10-16 20:15 in fact, this page could be being read as part of readahead 2008-10-16 20:15 the locking logic doesn't change 2008-10-16 20:15 the unlock happens in the bio endio 2008-10-16 20:15 does it make sense to wait for a read ahead? 2008-10-16 20:16 much as maze implemented in junkfs 2008-10-16 20:16 yes, it is mandatory to wait for the readahead 2008-10-16 20:16 any task waiting on the lock intends to use the data on the page 2008-10-16 20:16 it could be a parallel task reading at a different place in the file 2008-10-16 20:17 or a page fault caused by memory access to mmap region 2008-10-16 20:17 we don't care why somebody is trying to read the page, only that the read is happening 2008-10-16 20:17 now, where does the page get locked? 2008-10-16 20:17 3 minute search 2008-10-16 20:17 :D 2008-10-16 20:18 reason out loud 2008-10-16 20:18 hint: we've been in the neighbourhood before 2008-10-16 20:18 the locking should happen in somebody that needs a page 2008-10-16 20:19 grab_page or some of the friends :P 2008-10-16 20:19 right, and what piece of code might? 2008-10-16 20:19 grab_page doesn't actually read a page 2008-10-16 20:19 it's part of the mechanism for doing file write 2008-10-16 20:19 (that was another hint) 2008-10-16 20:20 well... we need the data from the page so to me it make sense to look in the read part 2008-10-16 20:20 yes 2008-10-16 20:20 what read part should be look at? 2008-10-16 20:21 get_block? checking now... 2008-10-16 20:21 get_block just performs the mapping between a logical block number and a physical block number, it doesn't actually do IO on the block 2008-10-16 20:22 (though in the tux3 user space code, our equivalent does) 2008-10-16 20:22 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L1170 2008-10-16 20:23 generic_blah_read 2008-10-16 20:23 it is actually the responsibility of the application, in most cases a filesystem, to lock the page 2008-10-16 20:23 go_generic_file_read? 2008-10-16 20:23 maybe 2008-10-16 20:23 because I still don't see the locking :P 2008-10-16 20:24 let's find the actual call to lock_page here 2008-10-16 20:24 yes, do_* 2008-10-16 20:24 you see the for loop 2008-10-16 20:24 something interesting: http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L61 2008-10-16 20:25 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L894 yes... 2008-10-16 20:25 yes, that's a message to you straight from akpm 2008-10-16 20:25 the first to actually bother to write this stuff down 2008-10-16 20:25 in fact, what we are covering here today is written in no book 2008-10-16 20:26 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L1003 2008-10-16 20:26 one day maybe we will write the book, can you write well? 2008-10-16 20:26 ACTION writes very badly 2008-10-16 20:26 but a book about FS is a great idea! :D 2008-10-16 20:26 really-really great 2008-10-16 20:26 research well then? 2008-10-16 20:26 that I already know 2008-10-16 20:27 good researcher 2008-10-16 20:27 ok, that is one important place the page gets locked 2008-10-16 20:27 but I don't think it's the main one, let me check 2008-10-16 20:28 oh yes it is 2008-10-16 20:28 you nailed it 2008-10-16 20:28 ok, do_blah_read is far from the only place a page can be read 2008-10-16 20:29 it's the not up to date branch so it make some sense to be the main one 2008-10-16 20:29 the _get_block method may have to read one or more pages to figure out what the physical mapping for a file page is 2008-10-16 20:29 yes, it is 2008-10-16 20:30 the other big branch is the not present branche 2008-10-16 20:30 we're not going over those details today, though we should later 2008-10-16 20:30 ack 2008-10-16 20:31 ok, what else do we need to know about page locking? 2008-10-16 20:31 what we didn't do is the buffer locking, which is mixed together with the page locking in block_read_full_page 2008-10-16 20:31 I think we will leave that for later, we've done enough buffers recently 2008-10-16 20:32 suffice to say, that that part is an unholy mess 2008-10-16 20:32 :-) 2008-10-16 20:32 (I don't see the buffer locking in block_read_full_page :() 2008-10-16 20:32 I should mention that the locking path we just looked at used to be the main path for file reading in the past 2008-10-16 20:33 it isn't any more 2008-10-16 20:33 which one is it? 2008-10-16 20:34 first, take a look at this: http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L1021 2008-10-16 20:35 error = mapping->a_ops->readpage(filp, page); 2008-10-16 20:35 looks like the main read, correct? 2008-10-16 20:35 right 2008-10-16 20:36 the truth is, the fs does not have to limit itself to reading just this page 2008-10-16 20:36 it must read the page asked for, or you see we return EIO here 2008-10-16 20:36 but it can also read a bunch more pages at the same time 2008-10-16 20:37 which new incarnations of ext3 does 2008-10-16 20:37 we will take a look at that another time 2008-10-16 20:37 it's the multipage path 2008-10-16 20:37 a whole, big, much messier topic 2008-10-16 20:37 just a quick q: there is a generic implementation for readpage somewhere, right? 2008-10-16 20:37 yes, we started with it today 2008-10-16 20:38 block_read_full_page 2008-10-16 20:38 aaaaaa :D 2008-10-16 20:38 ok 2008-10-16 20:38 look for all occurrences, you will find about one/fs 2008-10-16 20:38 tux3 will have one too 2008-10-16 20:38 (the romfs implements it :P) 2008-10-16 20:38 well 2008-10-16 20:38 sorry 2008-10-16 20:38 tux3 is going to ignore block_read_full_page and do the io by a different method I think 2008-10-16 20:39 as you can see, the block_read_full_page function is rather more convoluted that you would expect 2008-10-16 20:39 mpage_readpage 2008-10-16 20:39 that costs cpu, and worse, does operations a page at a time, at best 2008-10-16 20:39 right 2008-10-16 20:39 I haven't really looked at that in depth myself 2008-10-16 20:40 we'll do it as a group, in fact that can be homework for next time 2008-10-16 20:40 ack 2008-10-16 20:40 :D 2008-10-16 20:40 read the mpage path 2008-10-16 20:40 noted 2008-10-16 20:40 ok, that is only half the story of page locking lifecycle 2008-10-16 20:40 the other half is on the write side 2008-10-16 20:41 so lets start similarly by looking at block_write_full_page (again) 2008-10-16 20:41 http://lxr.linux.no/linux+v2.6.26.5/fs/buffer.c#L2801 2008-10-16 20:41 :) 2008-10-16 20:41 and then http://lxr.linux.no/linux+v2.6.26.5/fs/buffer.c#L1645 2008-10-16 20:42 same locking check :P 2008-10-16 20:42 right 2008-10-16 20:42 good 2008-10-16 20:42 io is by nature symmetric 2008-10-16 20:42 in linux you often have to see past a lot of cruft to see that 2008-10-16 20:43 what would be the equivalent of read ahead for write? :P 2008-10-16 20:43 aaa... flushing 2008-10-16 20:43 write_garbage_ahead? 2008-10-16 20:43 sure 2008-10-16 20:43 I should have though more before asking :P 2008-10-16 20:43 flushing 2008-10-16 20:43 me too 2008-10-16 20:43 couldn't resist the temptation to make a joke 2008-10-16 20:44 write_garbage_ahead is funny :P 2008-10-16 20:44 locking strategy is assymetric in a surprising way here 2008-10-16 20:44 many loops in this functions... 2008-10-16 20:45 ...reading code here... 2008-10-16 20:45 function 2008-10-16 20:45 yes, it's going steadily cruftier over time 2008-10-16 20:45 has reached a truly startling stage by now 2008-10-16 20:45 we can skip the first one, right? :D 2008-10-16 20:45 sure, and look at this: http://lxr.linux.no/linux+v2.6.26.5/fs/buffer.c#L1748 2008-10-16 20:46 unlock_page(page); 2008-10-16 20:46 :D 2008-10-16 20:46 symmery, we haz it! 2008-10-16 20:46 that interesting thing is, this is done right after the submit_bh, which doesn't wait for the actual IO to take place 2008-10-16 20:46 this is the asymmetric part 2008-10-16 20:46 the page is unlocked during the actual write, but for a read it is locked 2008-10-16 20:47 why do you suppose that might be? 2008-10-16 20:47 for read we need the content so we need to wait for the result 2008-10-16 20:47 right, and why can we drop the lock for the write? 2008-10-16 20:47 for write we don't need to wait if we don't care if it fails :P 2008-10-16 20:48 and what advantage is there to dropping the lock for write? 2008-10-16 20:48 stupid q: the locks are counting locks? 2008-10-16 20:48 the truth is, I don't really know the advantage, it is always a racy bug to write to a page that is in process of being written to media 2008-10-16 20:49 no, the locks are nonrecursive 2008-10-16 20:49 good guess though 2008-10-16 20:49 in some cases, there is no way to prevent the race of writing to a page that is currently being transferred to disk 2008-10-16 20:49 actually, this unlock will wakeup somebody... 2008-10-16 20:50 true, which will do a racy, useless write the the page 2008-10-16 20:50 who will be wake up? 2008-10-16 20:50 good question 2008-10-16 20:50 some buggy application probably 2008-10-16 20:50 there much be a lock somewhere... 2008-10-16 20:51 we check for the lock to be on at the start of the function 2008-10-16 20:51 the reason this is always a race is, a write to this memory location while the page is in flight could easily take pace at exactly the same time as the dma transfer 2008-10-16 20:51 so we can't predict whether the page will have new or old data or part of each on disk 2008-10-16 20:51 the dma will use the same lock? 2008-10-16 20:51 dma uses no lock 2008-10-16 20:51 hmm... 2008-10-16 20:52 really? 2008-10-16 20:52 once we have sent a page down to the block layer, dma can be initiated at any time 2008-10-16 20:52 really 2008-10-16 20:52 before starting the dma the page is not locked? 2008-10-16 20:52 scary? 2008-10-16 20:52 if you're scared by that you're starting to get it 2008-10-16 20:52 dma doesn't care a bit about page locks 2008-10-16 20:52 hmm... 2008-10-16 20:52 look through all the dma code, you will find no synchronization there 2008-10-16 20:52 I though the OS will take some care... 2008-10-16 20:52 except with the disk hardware 2008-10-16 20:53 the filesystem should take care all right, but it can't do anything about mmaped writes for example 2008-10-16 20:53 tux3 will take a great deal of care there 2008-10-16 20:53 because we can also be writing out metadata here 2008-10-16 20:54 and it is always a bug to have a racy write to metadata 2008-10-16 20:54 in other words, the synchronization is performed by caller 2008-10-16 20:54 ack :D 2008-10-16 20:54 the vfs/block library can't possibly know enough to do the synchronization itself 2008-10-16 20:55 ok, the rest is just a reading exercise 2008-10-16 20:55 the page locks will be found in generic_* like as for the read case 2008-10-16 20:55 though in some cases they will be buried in filemap functions like grab_cache_page 2008-10-16 20:56 what would you like to look at for next tuesday? 2008-10-16 20:56 my deadline for 22 was canceled so I'll have some time to work on the fs stuff again :P 2008-10-16 20:56 :) 2008-10-16 20:57 well tomorrow I'll probalby get back in the hacking chair 2008-10-16 20:57 maze would know better to answer to that question 2008-10-16 20:57 fix some filemap bugs and maybe that readdir thing 2008-10-16 20:57 let's take a run at mpage 2008-10-16 20:57 what's the question? 2008-10-16 20:57 "what next" 2008-10-16 20:57 flips: what would you like to look at for next tuesday? 2008-10-16 20:57 MaZe: where have you been?? 2008-10-16 20:58 mpage.c I think 2008-10-16 20:58 unfortunately, working since 8am today 2008-10-16 20:58 flips: you are working on tux3 only your free time? 2008-10-16 20:58 so we can see how akpm goes about bypassing what looks like the main IO paths in the kernel 2008-10-16 20:58 I've read through till 8:30 pm 2008-10-16 20:58 razvanm, indeed 2008-10-16 20:59 wow... 2008-10-16 20:59 MaZe: reading? 2008-10-16 20:59 ok, you mentioned my name, so it pinged me and I looked, but I'm not sure what exact question and how to answer it 2008-10-16 21:00 (reading? trying, but not having the time to do it well really) 2008-10-16 21:00 MaZe: the questions was what to talked about next tuesday 2008-10-16 21:00 maze is busy handling a large fire in a google data center 2008-10-16 21:00 at the moment 2008-10-16 21:01 ACTION lies through his teeth 2008-10-16 21:01 fire?!? 2008-10-16 21:01 standard goog joke 2008-10-16 21:01 :-) 2008-10-16 21:01 disk drive caught fire 2008-10-16 21:01 flames leaping into the statosphere 2008-10-16 21:02 causing your gmail to lag by tens of seconds 2008-10-16 21:02 nothing beats the waiting time for loading gmail the first time 2008-10-16 21:02 so about that 10d... 2008-10-16 21:02 when I first noticed the progress bar I though it's a joke :P 2008-10-16 21:03 we got the 10d after a rebel xt :P 2008-10-16 21:03 that's mainly crappy javascript parsing in firefox 2008-10-16 21:03 because I had the chance to hold a 10d in my hand and I fall in love :P 2008-10-16 21:03 rebel xt is a nice machine in its own right 2008-10-16 21:03 there is no time for the progress bar in chrome? 2008-10-16 21:03 but when you get a real camera the difference is obvious 2008-10-16 21:04 I really like that big wheel 2008-10-16 21:04 and the way you can very quickly change the settings 2008-10-16 21:04 chrome's big deal is a faster javascript parser 2008-10-16 21:04 and the much lower shutter lag 2008-10-16 21:04 and the heft 2008-10-16 21:04 that's a big one for me 2008-10-16 21:04 really helps in framing shots 2008-10-16 21:04 you really noticed the diffence in shutter speed? 2008-10-16 21:05 shutter lag 2008-10-16 21:05 lag sorry... 2008-10-16 21:05 faster data path from the sensor etc 2008-10-16 21:05 faster focus setup (though still sucks) 2008-10-16 21:05 faster motor drive, that's a big one for me 2008-10-16 21:05 changing the focus points is not my main strength :P 2008-10-16 21:06 200 ms on the 20d 2008-10-16 21:06 bigger controls, also a big deal 2008-10-16 21:06 the funny thing, the eye tracking in and old A2E I have is really working :-) 2008-10-16 21:06 a2e? 2008-10-16 21:06 looks similar with the 10d but it's on film 2008-10-16 21:07 I only dabbled in real film a little bit 2008-10-16 21:07 I like being able to take thousands of shots at $0/shot 2008-10-16 21:08 I shoot film after I did on digital 2008-10-16 21:08 is nice 2008-10-16 21:08 in a different way :P 2008-10-16 21:08 yes, I can see that, I don't like waiting to see the results though 2008-10-16 21:08 :D 2008-10-16 21:08 well by now on digital I usually know whether I've got a good shot without looking 2008-10-16 21:08 I also use a ricoh GX100 almost daily 2008-10-16 21:09 digital is awesome when you use flashes :D 2008-10-16 21:09 that fixes a lot, true 2008-10-16 21:09 http://www.canon.com/camera-museum/camera/film/data/1991-1995/1992_eos5_qd.html the A2E 2008-10-16 21:09 about time for me to get a real flash 2008-10-16 21:10 yup! :D 2008-10-16 21:10 I'm goint to take my camera down to the boardwalk tomorrow 2008-10-16 21:10 in abovementioned backpack 2008-10-16 21:10 boardwalk? 2008-10-16 21:10 the sunsets are beyond belief, with the fires going on up the valley 2008-10-16 21:10 venice beach 2008-10-16 21:11 http://www.imagekandi.com/photo/images/Venice-Beach-Board-Walk.jpg 2008-10-16 21:11 aaa 2008-10-16 21:11 awesome :D 2008-10-16 21:11 saw exactly that tonight, except much redder 2008-10-16 21:11 skate through that spot every day 2008-10-16 21:11 stop! :D 2008-10-16 21:12 there was a movie being shot just north ;) 2008-10-16 21:12 I skated into the middle of it, the security guy assumed I must be with the crew 2008-10-16 21:12 :-) 2008-10-16 21:12 funny 2008-10-16 21:12 could have gotten myself a free helping at the buffet 2008-10-16 21:12 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-16 21:12 but had to get home for tux3 u 2008-10-16 21:13 aaa 2008-10-16 21:13 so later would be better? :D 2008-10-16 21:13 it's fine 2008-10-16 21:13 there will be another shoot 2008-10-16 21:13 and more free food 2008-10-16 21:13 (we had to come home earlier to catch it :P) 2008-10-16 21:13 it was already dark by that time 2008-10-16 21:13 I was asking about the tux3 u :P 2008-10-16 21:13 current time for tux3 u works fine for me 2008-10-16 21:14 do you have flickr stream? :P 2008-10-16 21:14 now, is it good? 2008-10-16 21:14 oh 2008-10-16 21:14 (thanx for the lesson tonight) 2008-10-16 21:14 you mean for for the skate 2008-10-16 21:14 I'll get pix tomorrow 2008-10-16 21:14 funny I never thought of doing that before 2008-10-16 21:14 take it for granted 2008-10-16 21:14 but last few days have been way over the top 2008-10-16 21:15 everybody just stopping and staring with their mouths open 2008-10-16 21:15 :-) 2008-10-16 21:16 got to clean my sensor 2008-10-16 21:16 got one of those visible dust thingies, haven't used it yet 2008-10-16 21:16 but I will hate myself if there is visible dust on my photos tomorrow 2008-10-16 21:17 if you open the aperture then they not be so visible... 2008-10-16 21:17 kind of hard when you're shooting straight into the sun 2008-10-16 21:17 well 2008-10-16 21:17 20d also doesn't have the auto-cleaning stuff... 2008-10-16 21:17 I'll set the shutter fast 2008-10-16 21:17 right 2008-10-16 21:17 yeah... 2008-10-16 21:17 can't do that either 2008-10-16 21:17 all the palm trees will be black 2008-10-16 21:18 so I'll clean the sensor 2008-10-16 21:18 well... if the range of light is big there is not much to do anyway... 2008-10-16 21:18 aaa... you could use a filter 2008-10-16 21:18 the main reasons for going to the 40d: 1) 20% more bit fat pixels 2) 3 inch display 2008-10-16 21:18 a gradient filter... 2008-10-16 21:18 everything else I don't really care that much about 2008-10-16 21:19 I think that working with big files is a pain :P 2008-10-16 21:19 I'll bring my filters 2008-10-16 21:19 play around 2008-10-16 21:19 depends on what machine you have though :D 2008-10-16 21:19 cool :P 2008-10-16 21:19 the ee handles those files just fine 2008-10-16 21:19 make a perfect complement to the canon 2008-10-16 21:20 wow! you open the files on that machine??? 2008-10-16 21:20 you can't fill the 16 GB flash in a day 2008-10-16 21:20 works great 2008-10-16 21:20 it's a pretty fast little machine 2008-10-16 21:20 1 GB memory 2008-10-16 21:21 I can't believe it only cost $500, now costs closer to $300 2008-10-16 21:21 nice... 2008-10-16 21:21 it's going to get a big brother pretty soon, eee 1000 2008-10-16 21:22 what I need to be able effectively on the road 2008-10-16 21:22 OT: http://farm4.static.flickr.com/3017/2894438808_e0d5f9bfbb.jpg taken with a cheap old flash and a cheap umbrella 2008-10-16 21:22 the 9 inch keyboard causes some strain 2008-10-16 21:22 the screen is bigger on the 1000? 2008-10-16 21:23 10 inches, and a 92% keyboard 2008-10-16 21:23 crisp indeed 2008-10-16 21:23 time delay? 2008-10-16 21:23 or raluca on the camera maybe 2008-10-16 21:24 time delay? 2008-10-16 21:24 that's a shot of you, no? 2008-10-16 21:25 http://farm4.static.flickr.com/3286/2893639655_3aed45ecd4_b.jpg 2008-10-16 21:25 yup, that was me 2008-10-16 21:25 the other one is one with Ral 2008-10-16 21:25 notice the black border from the bottom 2008-10-16 21:25 arty 2008-10-16 21:25 I used a 250 shutter speed 2008-10-16 21:25 that background is fine 2008-10-16 21:26 the cap is 200 for 10d 2008-10-16 21:26 what's the black at the bottom? 2008-10-16 21:26 it's the shutter :D 2008-10-16 21:26 so how'd you get 250? 2008-10-16 21:26 ah 2008-10-16 21:26 the 250 was the shutter speed 2008-10-16 21:26 you can tell it do do it, it won't 2008-10-16 21:27 sorry? 2008-10-16 21:27 you can set it to 1/250, but they you get a picture of the shutter, no? 2008-10-16 21:27 and thought the shutter moved left to right 2008-10-16 21:28 not top to bottom 2008-10-16 21:28 I thought I meant 2008-10-16 21:28 it's top to bottom in slr :D 2008-10-16 21:28 ok, and to 1/8000 in the 20d 2008-10-16 21:28 the sync with the flash will be also aroun 1/200 2008-10-16 21:28 or 1/250... 2008-10-16 21:29 you are a more leet photog than me 2008-10-16 21:29 I haven't even gotten into flash sync let 2008-10-16 21:29 yet 2008-10-16 21:29 btw: in my lab the colors are very nice with a color balance of 4000K 2008-10-16 21:30 lab? 2008-10-16 21:30 the office 2008-10-16 21:30 let me see, it means everything is very blue there? 2008-10-16 21:31 we have fluorescent light 2008-10-16 21:31 condolences 2008-10-16 21:31 I use to shoot using the fluorescent setting but the 4000K is much better 2008-10-16 21:31 just twist the tubes out 2008-10-16 21:31 which have 5500K? like the natural light? 2008-10-16 21:31 is that what it is? 2008-10-16 21:32 (the flash sync for 20d is 1/250, congrats :P) 2008-10-16 21:32 I think the color balance doesn't really affect the camera settings, just the jpg conversion 2008-10-16 21:32 so going on that theory, I always shoot raw and never change the temperature 2008-10-16 21:32 I shoot raw for some time 2008-10-16 21:33 the size of the files and the processing was too much for me :P 2008-10-16 21:33 now I'm using jpg so I need to set it right :D 2008-10-16 21:33 they haven't really improved on the 20d mechanics in the 30d and 40d 2008-10-16 21:33 bad canon 2008-10-16 21:33 still 200 ms/shot is the state of the art for prosumer 2008-10-16 21:34 raw is quite comfortable on the 20d 2008-10-16 21:34 except sometimes you have to wait for a shot while the transfer to flash is in progress 2008-10-16 21:34 do you shoot a lot? 2008-10-16 21:34 can fix that with a faster flash card 2008-10-16 21:34 is 5,000 shots/year a lot? 2008-10-16 21:35 not really... 2008-10-16 21:35 right 2008-10-16 21:35 more than most people, less than a true photog 2008-10-16 21:35 we have about 1000 per month 2008-10-16 21:35 that's what I did when I first got it 2008-10-16 21:36 (shoot at 4000K: http://farm4.static.flickr.com/3223/2893706927_bbfc0360d7_b.jpg ) 2008-10-16 21:36 we took so far 33K of pictures... 2008-10-16 21:36 major boca 2008-10-16 21:36 from mid 2003 till now 2008-10-16 21:36 major boca? 2008-10-16 21:37 fuzzy background 2008-10-16 21:37 did I spell that right? 2008-10-16 21:37 spanish? :D 2008-10-16 21:37 camera term 2008-10-16 21:38 we use a cheap 50mm f1.8 2008-10-16 21:38 best for boca 2008-10-16 21:38 bokeh? 2008-10-16 21:38 prime 2008-10-16 21:38 right 2008-10-16 21:38 http://en.wikipedia.org/wiki/Bokeh ? 2008-10-16 21:38 that's the one 2008-10-16 21:38 we only have two zoom lens 2008-10-16 21:38 considered high art 2008-10-16 21:39 all the rest are primes 2008-10-16 21:39 I have yet to get a prime lens 2008-10-16 21:39 just been lazy 2008-10-16 21:39 I like primes because you don't have the problem of zooming ;-) 2008-10-16 21:39 I shoot with this almost exclusively: http://www.the-digital-picture.com/reviews/Canon-EF-S-17-55mm-f-2.8-IS-USM-Lens-Review.aspx 2008-10-16 21:40 barely fits in my holster bag 2008-10-16 21:40 nice!! 2008-10-16 21:40 2.8... IS :D 2008-10-16 21:40 get lots of attention 2008-10-16 21:40 people wonder what is the point of all that glass 2008-10-16 21:40 one of our zooms is the 17-40 F4 :P 2008-10-16 21:40 weighs a kilo, more than the camera 2008-10-16 21:40 the 2.8 :P 2008-10-16 21:41 http://www.flickr.com/gp/46249124@N00/U5gk98 some of the speakers from our CS Seminar 2008-10-16 21:41 can do some nice things with the IS 2008-10-16 21:41 I use the 85mm f1.8 2008-10-16 21:41 like shoot without flash in dim light 2008-10-16 21:41 I don't have any IS lens 2008-10-16 21:41 cool :D 2008-10-16 21:41 anyway, time to go do family stuff 2008-10-16 21:42 I'll be back working on the next post later 2008-10-16 21:42 have a nice evening! 2008-10-16 21:42 you too 2008-10-16 21:42 thanks for the lesson 2008-10-16 21:42 thanks for coming 2008-10-16 21:42 well... I did the easy thing :P 2008-10-16 21:42 this is eventually going to turn into a book I think 2008-10-16 21:42 :D 2008-10-16 21:42 not many people need to read this book, but those who do need it bad 2008-10-16 21:43 I would love to do a fast fwd to see it ;-) 2008-10-16 22:46 ¨hey 2008-10-16 22:46 hi 2008-10-16 22:47 funny, I've been thinking about getting a canon G10 2008-10-16 22:47 seems to be the biggest bang for the buck at this time 2008-10-16 22:47 flips: how's it going ? 2008-10-16 22:48 ACTION is playing around with some lockdep/stat related changes to track rq lock contention 2008-10-16 22:48 going fine 2008-10-16 22:48 bh: dslr elitests wont speak of such a camera 2008-10-16 22:49 shapor: really ? don't like it ? 2008-10-16 22:49 powershot says it all 2008-10-16 22:49 or is it the case that it makes their purchase look bad ? 2008-10-16 22:49 ACTION is not a dslr elitest like flips 2008-10-16 22:49 20d can be had for $450 now 2008-10-16 22:49 now reason not to get a real camera 2008-10-16 22:49 yeah, but that's older technology 2008-10-16 22:49 point and shoots are much better 2008-10-16 22:50 beats heck out of any point n shoot 2008-10-16 22:50 more likely to have it with you when you want it 2008-10-16 22:50 because of optics ? 2008-10-16 22:50 still gets oohs an ahs pretty much every time it comes out of the bag 2008-10-16 22:50 like with a normal 50mm lens and stuff ? 2008-10-16 22:50 most people on the dslr bandwagon are poor photographers who dont even use 1% of the 100's of features their cameras have 2008-10-16 22:50 I'm interested in night photography, indoor club stuff 2008-10-16 22:50 posers... 2008-10-16 22:50 shapor: I agree 2008-10-16 22:51 shapor, and most people aren't posers period? 2008-10-16 22:51 which is why I went with a good consumer casio 2008-10-16 22:51 why limit the discussion to photo posers? 2008-10-16 22:51 it's done well for me and taken tons of punishment from the playa, etc... 2008-10-16 22:51 also its a huge plus being able to have a camera you dont mind dropping or taking on a camping trip for fear of getting ruined 2008-10-16 22:51 I'm never going to get any really expensive lens so I'm seriously thinking about a G10 2008-10-16 22:52 or be burdened with its weight 2008-10-16 22:52 something about that big fat Ka-LICK is addictive 2008-10-16 22:52 my gf has a nikon dslr w/a few $1500+ lenses 2008-10-16 22:52 its more of a hassle than anything else 2008-10-16 22:52 nikon... 2008-10-16 22:52 gotta have a rediculously big tripod to hold it stable 2008-10-16 22:53 totaly not worth it 2008-10-16 22:53 I still manage to fit mine in a holster 2008-10-16 22:53 just barely 2008-10-16 22:53 kind of bulges 2008-10-16 22:53 my sony that fits in my pocket i can prop up on my jacket and take 30s exposures with has gotten much better use 2008-10-16 22:53 can the ex-pro photographer weigh in here? 2008-10-16 22:54 get the lumix or the leica 2008-10-16 22:54 pro's have to stay quiet ;) 2008-10-16 22:54 same optics 2008-10-16 22:54 flips: see even the pro agrees :P 2008-10-16 22:54 great compact camera 2008-10-16 22:54 yeah i considered the lumix 2008-10-16 22:54 wide lens is nice 2008-10-16 22:54 he's just trying not to shame you in public ;) 2008-10-16 22:55 i found their noise reduction a bit dated 2008-10-16 22:55 lumix? 2008-10-16 22:55 yeah 2008-10-16 22:55 uh oh 2008-10-16 22:55 on the panasonic anyway 2008-10-16 22:55 persie 2008-10-16 22:55 everybody knows that in the circles timothy used to move in there are only canon and nikon marks to be seen 2008-10-16 22:55 i think the leica has different software 2008-10-16 22:55 although i may be wrong 2008-10-16 22:55 it does 2008-10-16 22:55 and its better 2008-10-16 22:55 yeah 2008-10-16 22:55 that's why I bought it 2008-10-16 22:56 $100 more or so 2008-10-16 22:56 noise reduction should be done offline 2008-10-16 22:56 let's argue about roller skates next 2008-10-16 22:56 flips: how often do you shoot raw? 2008-10-16 22:57 its a pita 2008-10-16 22:57 big files 2008-10-16 22:57 with post processing 2008-10-16 22:57 if you want to do anything with it 2008-10-16 22:57 do any of the cameras do lossless compression? 2008-10-16 22:58 i alwys wondered why they didnt 2008-10-16 22:58 shapor, always shoot raw 2008-10-16 22:58 some do lzw 2008-10-16 22:58 i always shoot jpegs 2008-10-16 22:58 unless its for money 2008-10-16 22:58 yeah i can fill a 4GB card with jpgs between dumps 2008-10-16 22:58 raws would kill that 2008-10-16 22:59 flips: how's tux3 development going. Seeing that various bug fixes went in about a week ago, but I'm assuming that you're working on other stuff that's yet to be committed. 2008-10-16 22:59 I shoot about 200 pics a day on the theory it's not a video camera 2008-10-16 22:59 i shoot with my iPhone just to annoy my linux geek friends 2008-10-16 22:59 bh, working on a follow up atomic commit degisn post 2008-10-16 22:59 tim_dimm: :) 2008-10-16 23:00 flips: so you're working on atomic commits now ? 2008-10-16 23:00 yes 2008-10-16 23:00 last big thing before kernel port 2008-10-16 23:01 tim_dimm, you got off some decent shots in spite of the beyond belief shutter lag 2008-10-16 23:01 noisy, but in focus 2008-10-16 23:01 the 3G has a much nicer camera 2008-10-16 23:02 better shadow detail 2008-10-16 23:02 nicer lens 2008-10-16 23:02 get a gphone 2008-10-16 23:02 ACTION hides 2008-10-16 23:02 its getting ripped for usability 2008-10-16 23:02 too many buttons 2008-10-16 23:02 remember, I've been using a one button mouse for years 2008-10-16 23:02 http://shapor.com/pics/trips/out_west-2004-10-11/vegas/.html/IMAG0249.JPG.html 2008-10-16 23:02 ;-) 2008-10-16 23:02 macheads get confused by buttons I know 2008-10-16 23:02 ^ reason i use a cheap camera 2008-10-16 23:02 peeceers like them 2008-10-16 23:02 flips: reading your post now 2008-10-16 23:03 what gear? 2008-10-16 23:03 cause that looks like 12k rpm 2008-10-16 23:03 exposure issues? 2008-10-16 23:03 based on the speedo, 6th 2008-10-16 23:03 (thats the 600) 2008-10-16 23:03 160? 2008-10-16 23:03 oh 2008-10-16 23:03 146 i think 2008-10-16 23:04 clutch cable is kinda blocking it 2008-10-16 23:04 lcd speedo 2008-10-16 23:04 tim_dimm, if you get a gphone you can use the gps to measure your speed 2008-10-16 23:04 ...maybe 2008-10-16 23:04 actually I think it's just fake cell tower gps 2008-10-16 23:04 i've dropped 3 cameras off the bike now :( 2008-10-16 23:05 explains your attachment to point n shoots 2008-10-16 23:05 i should maybe a tether 2008-10-16 23:05 no emotional involvement 2008-10-16 23:05 make* 2008-10-16 23:06 that was a $69 vivitar from walmart 6 years ago 2008-10-16 23:06 http://shapor.com/pics/trips/out_west-2004-10-11/vegas/.html/IMAG0270.JPG.html 2008-10-16 23:06 thats right off the camera, no effects, heh 2008-10-16 23:06 super slow processing 2008-10-16 23:06 focal plane effect? 2008-10-16 23:06 no 2008-10-16 23:07 must be sensor scanout 2008-10-16 23:07 but... 2008-10-16 23:07 yeah 2008-10-16 23:07 the later pixels would be overexposed if that were the case 2008-10-16 23:07 it may compensate for that 2008-10-16 23:08 dunno 2008-10-16 23:08 that would be a trick 2008-10-16 23:08 what kind of shutter? 2008-10-16 23:08 a $69 walmart one back in 2002 ;) 2008-10-16 23:08 certainly digital 2008-10-16 23:09 digital shutter? 2008-10-16 23:09 ACTION doubts there is such a thing 2008-10-16 23:09 non-mechanical 2008-10-16 23:09 like a phone has, right? 2008-10-16 23:09 hmm 2008-10-16 23:09 aka cheap 2008-10-16 23:09 hm maybe its not that bad 2008-10-16 23:10 also running off commodity battery is a key feature i look for 2008-10-16 23:10 AA or AAA 2008-10-16 23:11 in a pinch all you do is stop at a gas station 2008-10-16 23:11 instead of this recharging business 2008-10-16 23:11 but the big canon battery will drive the built in flash all day 2008-10-16 23:12 you'd have a bag full of dead aa's 2008-10-16 23:12 actually a pocket full of them 2008-10-16 23:12 ammo for cars who cut you off 2008-10-16 23:12 dual purpose ;) 2008-10-16 23:15 oh, I just need to learn to frame my thoughts the right way 2008-10-16 23:16 :) 2008-10-16 23:16 ACTION heads down to wallmart to pick up a point n shoot and a shopping back fulla aa's 2008-10-16 23:16 flip the evil switch on, i'm sure you have one 2008-10-16 23:16 I parked illegally once 2008-10-16 23:17 I parked legally once 2008-10-16 23:19 trying to decide now whether I should call the thing that has a commit block a quantum or not 2008-10-16 23:21 what would be a better word? 2008-10-16 23:21 phase? 2008-10-16 23:21 phase is already used, a phase is made up of quanta at the moment 2008-10-16 23:21 and an episode is made up of phases 2008-10-16 23:21 whats a quantum made up of? 2008-10-16 23:22 pointers to extents 2008-10-16 23:22 and pointers to parent blocks to plug them into 2008-10-16 23:22 why isn't that just a commit block? 2008-10-16 23:23 there are commit blocks for each of quanta, phases and episodes 2008-10-16 23:23 so it would be a quantum commit block ;) 2008-10-16 23:23 sounds cool for sure 2008-10-16 23:40 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-16 23:40 hey all 2008-10-16 23:42 hi pranith 2008-10-16 23:49 flips: hello 2008-10-16 23:50 you were going to post a new mail? 2008-10-16 23:50 working on it 2008-10-16 23:50 maybe 30% done 2008-10-16 23:50 what is it about? 2008-10-16 23:51 details of how we get the writeout pattern I wrote about in the previous post 2008-10-16 23:51 hmm 2008-10-16 23:51 pretty much the most important issue besides the versioned pointers 2008-10-16 23:52 ok 2008-10-16 23:52 looking forward to it 2008-10-16 23:52 i was seeing the sandeepranade's time machine implementation using ext3cow... 2008-10-16 23:53 one thing i wanted to ask.. using versioned pointers how many copies can we store at a time? 2008-10-16 23:53 like if i use bittorrent.. the blocks keep changing all the time.. 2008-10-16 23:53 how do you handle such situations? 2008-10-16 23:54 we drop off old versions to make room for new ones 2008-10-16 23:54 without doing anything special, we can store about 500 versions 2008-10-16 23:54 hmm, we keep the versions until we run out of space? 2008-10-16 23:54 or 8,000 if we save some bits as discussed on the list, using the buddy system idea 2008-10-16 23:54 that might lead to fragmentation... 2008-10-16 23:54 you don't have to 2008-10-16 23:54 but you will be able to 2008-10-16 23:54 hmm 2008-10-16 23:55 we did that in zumastor with success 2008-10-16 23:55 so we can tune that number of versions? 2008-10-16 23:55 that part's not designed yet 2008-10-16 23:55 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-10-16 23:55 would be very useful though 2008-10-16 23:55 it asnwers the question, how do you avoid enospc when writing to a filesystem holding lots of snapshots 2008-10-16 23:56 hmm... yeah 2008-10-16 23:57 where might we be storing this information? 2008-10-16 23:57 if given as an option? 2008-10-17 00:00 which information? 2008-10-17 00:00 the number of versions to store? 2008-10-17 00:02 we hand'e that pretty nicely in zumastor 2008-10-17 00:02 each version has a priority, for any ties you discard the oldest 2008-10-17 00:03 anyway, that would go in the version table, a special file much as the allocation bitmap and atom table are 2008-10-17 00:03 hmm 2008-10-17 01:15 another question is how do you tell the fs your preference to hold a snapshot unless the space is needed 2008-10-17 01:16 more like, how to explain the priority system to an admin 2008-10-17 01:17 would also be useful to control on a per-file or per-directory basis 2008-10-17 01:17 aka "i dont care about snapshots of /var/log" 2008-10-17 01:17 which could burn a lot of our snapshot space 2008-10-17 01:30 shapor: right 2008-10-17 01:52 the argument that "we can't afford to fill up the filesystem because it would fragement to hell" is kind of interesting 2008-10-17 01:52 so how much can we fill the filesystem? 2008-10-17 01:53 why not just leave 5% free and misreport the available space? 2008-10-17 01:56 pranith, shapor, we get to play with new interfaces when we get to snapshots 2008-10-17 01:56 pioneering stuff 2008-10-17 02:04 flips: it would be good to have a professional design document with the things which are new in tux3... 2008-10-17 02:06 would be easy to refer to concepts 2008-10-17 02:07 and *figures* would really help :) 2008-10-17 02:07 shapor's working on it 2008-10-17 02:07 feel free go contribute figures 2008-10-17 02:07 they're quite time consuming to make 2008-10-17 02:07 we need some volunteers 2008-10-17 02:08 yeah.. ill ping him.. 2008-10-17 02:08 shapor: you need any help with figures :D 2008-10-17 02:08 pranith: yes! 2008-10-17 02:08 what exactly are u working on? 2008-10-17 02:08 any draft? 2008-10-17 02:09 very rough, its in the repo 2008-10-17 02:09 http://shapor.com/tux3/shapor-tux3/doc/design.html 2008-10-17 02:09 hmm 2008-10-17 02:10 no figures at all :( 2008-10-17 02:11 yeah, it would be useful for dleaf 2008-10-17 02:11 ok, i think this is worth doing.. 2008-10-17 02:11 very useful 2008-10-17 02:11 the only way i got it was flips drawing it on a whiteboard 2008-10-17 02:11 oh 2008-10-17 02:11 -!- pgquiles_(~pgquiles@121.Red-88-16-37.dynamicIP.rima-tde.net) has joined #tux3 2008-10-17 02:12 can you give me a picture or something.. 2008-10-17 02:12 ill make a figure from that? 2008-10-17 02:12 could 2008-10-17 02:12 hmm, great! 2008-10-17 02:13 perhaps this weekend 2008-10-17 02:13 tomorrow i'm going to be busy with work 2008-10-17 02:13 er later today i guess 2008-10-17 02:13 its 2am here 2008-10-17 02:14 oh 2008-10-17 02:14 hmm 2008-10-17 02:14 ok 2008-10-17 02:14 you mind mailing it to me? 2008-10-17 02:14 when u have it? 2008-10-17 02:15 sure 2008-10-17 02:15 okies 2008-10-17 02:20 hello 2008-10-17 02:20 can I ask for forward logging? 2008-10-17 02:21 are there any papers of detail? 2008-10-17 02:24 hirofumi: mail logs i guess 2008-10-17 02:24 ah 2008-10-17 02:25 but, that hard to understand all of detail for me 2008-10-17 02:26 btw, i have some patches. it should go to mailing list? 2008-10-17 02:26 for user/test/* 2008-10-17 02:32 hirofumi: yeah 2008-10-17 02:33 post them to the mailing list 2008-10-17 02:33 ok, thanks. 2008-10-17 02:35 hirofumi: what are those patches related to? 2008-10-17 02:37 little bug fixes, and draw tux3 intenal graph for newbie like me 2008-10-17 02:37 internal graph -> internal format 2008-10-17 02:37 internal graph as in a figure? 2008-10-17 02:37 you have a figure for that? 2008-10-17 02:38 I think some kind of 2008-10-17 02:38 draw by graphviz 2008-10-17 02:39 but, not perfect 2008-10-17 02:39 hmm 2008-10-17 02:39 something is better than nothing 2008-10-17 02:39 :) 2008-10-17 02:39 you posting it to the list? 2008-10-17 02:39 yeah :) 2008-10-17 02:40 yes 2008-10-17 02:40 ok, waiting for that... 2008-10-17 02:40 ok, I'll post those soon 2008-10-17 02:43 hirofumi, I have written a little bit about it 2008-10-17 02:43 some discussion with matt dillon 2008-10-17 02:43 some in the initial post 2008-10-17 02:43 and there will be some more in tomorrow's post 2008-10-17 02:44 oh, good. btw, what is big difference with usual jornaling? 2008-10-17 02:44 it puts the commit blocks all over the disk, not in a fixed place 2008-10-17 02:45 and it isn't limited to the size of the journal 2008-10-17 02:45 i see. and top of commit is pointed by superblock? 2008-10-17 02:47 btw, i posted some patches. could you check those? 2008-10-17 02:47 yes, re superblock 2008-10-17 02:48 and next commit is pointed by previous commit? 2008-10-17 02:50 yes 2008-10-17 02:51 I'll explain the details tomorrow 2008-10-17 02:51 your patches are looking good 2008-10-17 02:51 i see. thanks. btw, where does it come from? original? 2008-10-17 02:51 thanks 2008-10-17 02:51 I won't get through all ten tonight 2008-10-17 02:51 but by tomorrow 2008-10-17 02:52 original 2008-10-17 02:52 ah, I googled many times. 2008-10-17 02:53 heh 2008-10-17 02:54 hirofumi, you work for ntt or something like that? 2008-10-17 02:54 no 2008-10-17 02:55 student? sysadmin? 2008-10-17 02:55 flips: funny question.. why did u suspect so? 2008-10-17 02:55 foggy memory 2008-10-17 02:55 I'm finding next office, so I have some time for now. 2008-10-17 02:56 going to try that graph drawing patch right now 2008-10-17 02:56 btw, I sent example as reply 2008-10-17 03:03 wow 2008-10-17 03:03 that is beyond awesome 2008-10-17 03:04 pranith, I guess we have some figures now 2008-10-17 03:04 yippeee 2008-10-17 03:04 :) 2008-10-17 03:04 thanks :) kudos to graphviz 2008-10-17 03:06 wow.. 2008-10-17 03:06 the figure looks great.. 2008-10-17 03:06 hirofumi: thanks 2008-10-17 03:07 all most work by graphviz 2008-10-17 03:07 :) 2008-10-17 03:07 graphviz is just great 2008-10-17 03:09 so far it has not made a graph for me 2008-10-17 03:09 where does it write the output? 2008-10-17 03:10 if it's test/tux3.img, tux3graph should output test/test3.img.dot 2008-10-17 03:10 ok, it put it in /tmp 2008-10-17 03:10 it just add ".dot" postfix 2008-10-17 03:11 and "-v" is for full dump 2008-10-17 03:22 I used the comand: cat testdev.dot | dot -Tpng >testdev.png 2008-10-17 03:22 couldn't get dot to work otherwise 2008-10-17 03:22 I don't understand the syntax 2008-10-17 03:23 hirofumi, what name would you like for out "about us" page? 2008-10-17 03:23 we're listing all contributers 2008-10-17 03:23 and you just became one 2008-10-17 03:23 big time 2008-10-17 03:24 thanks. OGAWA Hirofumi 2008-10-17 03:24 it commented in tux3graph 2008-10-17 03:25 dot -Tpng -O foo.dot 2008-10-17 03:25 ogawa is your family name? 2008-10-17 03:25 yes 2008-10-17 03:25 you want all caps? 2008-10-17 03:26 yes, it may indicate it's family name 2008-10-17 03:26 my version of dot is too old to support -O I guess 2008-10-17 03:26 i see. i'll check... 2008-10-17 03:26 you could also write Ogawa, Hirofumi 2008-10-17 03:27 English way of showing family name 2008-10-17 03:27 for example, Phillips, Daniel 2008-10-17 03:27 I'm sure that's the problem 2008-10-17 03:27 this machine is running Etch 2008-10-17 03:27 so the command syntax I used works 2008-10-17 03:28 English way is Hirofumi Ogawa? I'm not sure 2008-10-17 03:28 i see. I'm using testing (lenny) 2008-10-17 03:29 normally I write Daniel Phillips, but it's also common to write Phillips, Daniel 2008-10-17 03:29 and the comma shows which is the family name 2008-10-17 03:29 Daniel is family name? 2008-10-17 03:30 oops, Phillips is family name? 2008-10-17 03:39 right 2008-10-17 03:39 all your changes look good 2008-10-17 03:39 I'll get some sleep first though before I start merging them 2008-10-17 03:40 that graph think is really sweet 2008-10-17 03:40 graph thing 2008-10-17 03:41 oh yay, the eee is booting off the dvd 2008-10-17 03:43 ok, it seems I could install centos on the eee 2008-10-17 03:43 not going to though 2008-10-17 03:51 Werror is killing tux3fuse... 2008-10-17 03:51 i think its better we dont use -Werror for now... 2008-10-17 04:00 sure 2008-10-17 04:00 what are the problematic warnings? 2008-10-17 04:00 posting... 2008-10-17 04:00 time for me to get some zzz's 2008-10-17 04:01 1 min 2008-10-17 04:01 -!- bobby(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-10-17 04:02 i dint understand 2008-10-17 04:02 from the code... 2008-10-17 04:02 cc1: warnings being treated as errors 2008-10-17 04:02 tux3fuse.c: In function ‘tux3_lookup’: 2008-10-17 04:02 tux3fuse.c:78: warning: initialized field overwritten 2008-10-17 04:02 tux3fuse.c:78: warning: (near initialization for ‘ep.attr’) 2008-10-17 04:02 tux3fuse.c:79: warning: initialized field overwritten 2008-10-17 04:02 tux3fuse.c:79: warning: (near initialization for ‘ep.attr’) 2008-10-17 04:02 tux3fuse.c:80: warning: initialized field overwritten 2008-10-17 04:02 tux3fuse.c:80: warning: (near initialization for ‘ep.attr’) 2008-10-17 04:02 the code looks ok to me 2008-10-17 04:06 yes, I don't know what it means by "initialized field overwritten" 2008-10-17 04:07 yup, am on ubuntu, gcc verson 4.2.3 2008-10-17 04:07 but we can write that differently 2008-10-17 04:07 in a what that makes it happier maybe 2008-10-17 04:07 like? 2008-10-17 04:07 .attr = { .st_mod = ... }, 2008-10-17 04:07 hmm 2008-10-17 04:07 ok, am trying 2008-10-17 04:09 man I am going to be busy merging patches tomorrow 2008-10-17 04:10 yup, thats working 2008-10-17 04:10 submit a patch? 2008-10-17 04:16 sure 2008-10-17 04:16 I think the compiler was wrong actually 2008-10-17 04:16 about the initialized field overwritten 2008-10-17 04:16 might also be worth a gcc bug report 2008-10-17 04:16 hmm 2008-10-17 04:17 ohk... 2008-10-17 04:17 or at least a post to the gcc list 2008-10-17 04:17 ok, will do that 2008-10-17 04:17 just the struct and the error message, ask if it's worth a bug report 2008-10-17 04:17 struct init I meant 2008-10-17 04:18 ok, doing it now.. 2008-10-17 04:19 flips: iattr.c:109: warning: left shift count >= width of type 2008-10-17 04:19 im sure this warning is also not genuine 2008-10-17 04:19 this is with gcc 3.4.6 on rhel 5 2008-10-17 04:20 ACTION looks 2008-10-17 04:21 does it still give the warning with (((u64)root->depth) << 48) ? 2008-10-17 04:23 yup 2008-10-17 04:23 did that :) 2008-10-17 04:23 still gives the same 2008-10-17 04:23 warning 2008-10-17 04:26 seems like the compiler is wrong indeed 2008-10-17 04:26 yup, done with the mail to gcc and gcc-bugs 2008-10-17 04:26 submitting patch to tux3... 2008-10-17 04:27 anyway, it is evil to give a warning when the shift count is == to the object size 2008-10-17 04:27 that is perfectly legitimate in many cases 2008-10-17 04:27 idiot compiler hackers ;) 2008-10-17 04:27 lol 2008-10-17 04:27 but hats off to them 2008-10-17 04:27 that too 2008-10-17 04:27 i find compiler writing a boring thing 2008-10-17 04:28 the price is right as well 2008-10-17 04:28 but storage is exciting? 2008-10-17 04:28 gets more attention anyway 2008-10-17 04:28 compiler only gets attention when it doesn't work 2008-10-17 04:28 hehe 2008-10-17 04:28 ok, I really have to sleep 2008-10-17 04:28 storage is THE thing 2008-10-17 04:29 hehe, gudnite 2008-10-17 04:29 thanks for all the great work everybody 2008-10-17 04:35 woohoo 2008-10-17 04:35 http://tux3.org 2008-10-17 04:36 i'm sorry, frames are so 1997 2008-10-17 04:41 new look? 2008-10-17 04:43 shapor: still waiting for the dleaf picture :) 2008-10-17 04:43 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-17 05:16 -!- FelipeS(~Felipe@lawn-128-61-26-125.lawn.gatech.edu) has joined #tux3 2008-10-17 06:10 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-17 06:55 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-17 07:41 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-17 07:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-17 08:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-17 08:13 -!- cydork_(~cydoork@122.169.100.164) has joined #tux3 2008-10-17 08:55 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-17 08:57 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-17 09:38 -!- prani(~bobby@122.163.49.185) has joined #tux3 2008-10-17 09:38 anyone here/ 2008-10-17 09:52 -!- prani(~bobby@122.163.49.185) has joined #tux3 2008-10-17 10:07 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-17 10:07 -!- prani(~bobby@122.163.49.185) has joined #tux3 2008-10-17 10:16 -!- prani(~bobby@122.163.49.185) has joined #tux3 2008-10-17 10:27 hmmm 2008-10-17 10:44 ponk 2008-10-17 11:00 grrr... switching to the latest 2.6 head is not painless :| 2008-10-17 11:13 who the hack is CONFIG_X86_WP_WORKS_OK? 2008-10-17 11:39 yeah... I'll stick with 2.6.26 for now :| 2008-10-17 12:17 hmmm 2008-10-17 12:17 been trying to figure out the leak reported by valgrind... 2008-10-17 12:17 hirofumi just posted a patch... 2008-10-17 12:21 ACTION added tar.gz snapshots of the hg repo for download on the tux3.org page 2008-10-17 12:39 shapor, hello 2008-10-17 12:40 ACTION really needs to understand the dleaf format 2008-10-17 13:12 prani, it's not memory leak. it read the memory beyond buffer size (e.g. rewind.group points the tail of buffer if leaf->groups == 0). 2008-10-17 13:57 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-17 16:25 ACTION getting closer to working on merges 2008-10-17 16:26 hirofumi, are you able to expose your mercurial repository on the net? 2008-10-17 16:27 anybody, what's a decent fast way to make diagrams on linux, so I can diagram the dleaf format? 2008-10-17 16:27 I've used xfig, it's powerful but weird 2008-10-17 16:27 and the dia clone, I forget what it's called 2008-10-17 16:27 name changed somewhere along the way 2008-10-17 16:27 pretty lame 2008-10-17 16:27 inkscape... tedious 2008-10-17 16:28 maybe I'll try inkscape again 2008-10-17 16:28 I like xfig... 2008-10-17 16:28 I like it, but I always end up tearing my hair out 2008-10-17 16:28 I find hard to control things in inkscape... perhaps I'm too old :| 2008-10-17 16:28 and it's really a lot of work making slick diagrams with it 2008-10-17 16:28 what are you trying to diagram? 2008-10-17 16:29 the dleaf format 2008-10-17 16:29 like I did no the whiteboard 2008-10-17 16:29 you need an expert in illustrator to do that? 2008-10-17 16:29 maybe shapor can remembe rthe whiteboard diagram and come up with some picture 2008-10-17 16:29 have you took a picture of the whiteboard? :D 2008-10-17 16:29 tim_dimm, would help 2008-10-17 16:29 I did 2008-10-17 16:29 I'm game 2008-10-17 16:29 in my phone 2008-10-17 16:29 can get it out with bluetooth 2008-10-17 16:30 OmniGraffle is quite nice for some stuff... 2008-10-17 16:30 ok, later for that one 2008-10-17 16:30 nice name 2008-10-17 16:30 if you can get it out to me, I'll work it out 2008-10-17 16:30 http://www.omnigroup.com/applications/OmniGraffle/ :P 2008-10-17 16:31 including pictures from there in papers doesn't work too well... :| 2008-10-17 16:31 need to do any diagrams with an opensource tool chain 2008-10-17 16:31 or at least the file formats have to be open and editable by open tools 2008-10-17 16:32 then inkscape might be the best option 2008-10-17 16:32 quite possibly 2008-10-17 16:32 do it in 3D with blender 2008-10-17 16:32 ;-) 2008-10-17 16:33 what?? :D 2008-10-17 16:33 3D graphics tool 2008-10-17 16:33 I know about it 2008-10-17 16:33 in 3D text mode with aalib 2008-10-17 16:33 looks like a big hammer for this 2008-10-17 16:33 tim was joking 2008-10-17 16:33 ascii?? :D 2008-10-17 16:34 I was joking, sorta 2008-10-17 16:34 anybody who hasn't run quake in text mode needs to 2008-10-17 16:34 with the complexity of your design, how cool would it be to explore it in 3D space? 2008-10-17 16:34 I actually run it once some time ago... 2008-10-17 16:35 it was doom II perhaps... don't remember exactly 2008-10-17 16:35 tim_dimm, sure, every project benefits from adding some smoke and caustics to their data structure diagrams 2008-10-17 16:35 physics 2008-10-17 16:37 povray anyone? :P 2008-10-17 16:39 I'm holding out for a shader to emboss them in 3D and I can blow them to dust with my bfg 2008-10-17 16:41 inkscape package is broken on etch, I'm going back to my tech post 2008-10-17 16:49 OT: http://ozviz.wasp.uwa.edu.au/~pbourke/miscellaneous/scifigure/ 2008-10-17 16:50 I'm sure tim was only half joking 2008-10-17 16:50 we really need our diagrams sitting on shiny disks like that 2008-10-17 16:52 ok ok ok ok :D 2008-10-17 16:54 i could output a library of images 2008-10-17 16:55 you could arrange them any way you want with little lines and everything 2008-10-17 16:55 got to get down to the strand with my camera tonight 2008-10-17 16:56 get some of that brushfire sunset 2008-10-17 16:56 show you point n shoot lamers what a camera can do ;) 2008-10-17 16:56 oh, should I break out the 8x10 camera? 2008-10-17 16:56 only if it's a point n shoot 2008-10-17 16:57 point, pull focus, point again, pull focus some more, then polaroid, then shoot, but only one sheet at a time 2008-10-17 16:57 plus it won't fit in my pocket 2008-10-17 16:57 wish I never sold that thing 2008-10-17 16:58 used to shoot portraits with it 2008-10-17 16:58 ACTION only has medium format cameras :P 2008-10-17 16:58 ah, found my filters 2008-10-17 16:58 polaroid was discontinued, right? 2008-10-17 16:59 flips: did you download the picture of the whiteboard? 2008-10-17 16:59 razvanm, not yet, I need to transfer it to somebody with bluetooth 2008-10-17 16:59 get it out of the t-mobile feature phone 2008-10-17 16:59 probably to tim's ipod 2008-10-17 17:00 or my laptop 2008-10-17 17:00 or maybe mac 2008-10-17 17:00 right 2008-10-17 17:00 :D 2008-10-17 17:00 I'll suspend any mac trashing during the transfer process 2008-10-17 17:11 folks 2008-10-17 17:11 flips: so you do have/use a mac? 2008-10-17 17:12 razvanm, not me 2008-10-17 17:12 being a linux apostle and all 2008-10-17 17:12 :D 2008-10-17 17:13 besides, I have too many fingers for a one button mouse 2008-10-17 17:13 and too many brain cells ;) 2008-10-17 17:13 ACTION ducks 2008-10-17 17:13 :D 2008-10-17 17:14 I got one finger for you ;-) 2008-10-17 17:14 http://osxbook.com/ 2008-10-17 17:14 huge book 2008-10-17 17:14 a googler also ;-) 2008-10-17 17:14 ACTION invokes his invisibility spell 2008-10-17 17:15 it seems to work ;-) 2008-10-17 17:17 ACTION has a question about d_child and d_subdirs 2008-10-17 17:18 sk8 oclock 2008-10-17 17:18 enjoy 2008-10-17 17:18 and take nice pictures :P 2008-10-17 17:19 I'm just thinking how well skates and camera combine 2008-10-17 17:19 rapid change of viewpoint 2008-10-17 17:19 great way to destroy equipment too 2008-10-17 17:19 perfect ;-) 2008-10-17 17:19 true also 2008-10-17 18:48 ACTION is exploring dentry_unused... 2008-10-17 19:39 usbdevfs is seriously braindamaged 2008-10-17 19:39 terminally fragile 2008-10-17 19:42 hmm, connect(3, {sa_family=AF_FILE, path="/var/run/dbus/system_bus_socket"}, 33) = 0 2008-10-17 19:43 dbus getting involved in anything gives me a sense of impending doom 2008-10-17 19:54 http://tux3.org/woohoo.jpg 2008-10-17 20:00 folks 2008-10-17 20:05 -!- prani(~bobby@122.162.70.80) has joined #tux3 2008-10-17 20:07 hello 2008-10-17 20:09 file:///var/www/tux3/paradise.jpg 2008-10-17 20:09 check it out 2008-10-17 20:09 huh 2008-10-17 20:09 var? 2008-10-17 20:09 file? 2008-10-17 20:09 whoops 2008-10-17 20:09 hehe 2008-10-17 20:09 http://tux3.org/paradise.jpg 2008-10-17 20:09 :) 2008-10-17 20:11 is that a shooting star? 2008-10-17 20:11 its too bright for that... 2008-10-17 20:12 a meteoroid i guess.. 2008-10-17 20:12 jet 2008-10-17 20:12 coming in to lax 2008-10-17 20:12 huh! 2008-10-17 20:12 nice... 2008-10-17 20:12 well going away actually 2008-10-17 20:13 and too high for lax 2008-10-17 20:13 military possibly 2008-10-17 20:13 or flying from san diego somewhere 2008-10-17 20:13 hmm 2008-10-17 20:14 nice evening... 2008-10-17 20:14 yah, it's been like that all week 2008-10-17 20:14 bushfire special 2008-10-17 20:14 is that by the sea side? 2008-10-17 20:17 flips, mind explaining the figure by hirofumi ? 2008-10-17 20:17 the blocks... 2008-10-17 20:17 it shows graphically the interrelationship between blocks 2008-10-17 20:18 http://tux3.org/boardwalk.jpg 2008-10-17 20:18 this is where I work on the design each day by the way 2008-10-17 20:19 u have any pic of urs? 2008-10-17 20:20 bful place you live in 2008-10-17 20:20 I took these pics tonight 2008-10-17 20:20 about an hour ago 2008-10-17 20:21 http://tux3.org/casadelmere.jpg 2008-10-17 20:22 where we practice skating down the steps 2008-10-17 20:22 crazy guys like tim and shapor 2008-10-17 20:24 http://tux3.org/thepier.jpg 2008-10-17 20:28 nice... 2008-10-17 20:28 http://tux3.org/carousel.jpg 2008-10-17 20:28 that's it, high points for tonight 2008-10-17 20:28 was fun skating with the camera 2008-10-17 20:29 someone shud have taken your pics tooo 2008-10-17 20:29 :) 2008-10-17 20:29 had to get from the skate park back to santa monica in about 5 minutes to catch the sunset 2008-10-17 20:29 flips: ur bandwidth is teh suck 2008-10-17 20:29 yeah.. pretty slow 2008-10-17 20:29 maybe tim_dimm will bring a real camera one time 2008-10-17 20:30 tim_dimm showed me a few videos last time 2008-10-17 20:30 skating down the hill 2008-10-17 20:30 http://tux3.org/casadelmere.jpg what is low bandwidth about this? 2008-10-17 20:30 oh 2008-10-17 20:30 my uplink 2008-10-17 20:30 true, and expensive too 2008-10-17 20:30 speakeasy 2008-10-17 20:30 not sure why I stick with them 2008-10-17 20:31 flips: http://shapor.com/bashgal 2008-10-17 20:31 I wonder if I can take my ip with me and go to verizon 2008-10-17 20:31 no way 2008-10-17 20:32 I thought there was a law that said I could 2008-10-17 20:32 whats with the ip? 2008-10-17 20:32 you have the domanin name... 2008-10-17 20:32 was ist das bashgal? 2008-10-17 20:32 sure 2008-10-17 20:32 redirect to the new site for a few days until the dns servers are updated... 2008-10-17 20:32 generates thumbnails 2008-10-17 20:32 and smaller pics 2008-10-17 20:33 right 2008-10-17 20:33 well 2008-10-17 20:33 i can lower the ttl so dns can flip over in 5 min 2008-10-17 20:33 every one of my pixels is loaded with goodness 2008-10-17 20:33 :) 2008-10-17 20:33 :) 2008-10-17 20:33 resolution is kind of rediculous on those 2008-10-17 20:34 flips, the dleaf block in the diagram ... 2008-10-17 20:34 in bitmap_dtree 2008-10-17 20:34 the extent table seems to be at the top... 2008-10-17 20:35 followed by the entry and group index 2008-10-17 20:35 http://tux3.org/carousel.jpg <- on the right is a nice ramp I slalom down and tim does pirouettes down 2008-10-17 20:35 ok 2008-10-17 20:35 back to tux3 :) 2008-10-17 20:37 prani, low addresses are at the top of the box 2008-10-17 20:37 hm 2008-10-17 20:38 extent table is at the bottom of a dlead, just after the header that mainly has the groups count 2008-10-17 20:38 dleaf 2008-10-17 20:39 whats the field immediately after the magic field.. 2008-10-17 20:39 3rd row 2008-10-17 20:39 says extent 0 2008-10-17 20:40 hmm, so we are growing from higher address to lower address 2008-10-17 20:40 higher address is the actual top of the leaf... 2008-10-17 20:40 ? 2008-10-17 20:40 the block directory (dictionary) grows down, the extent table groups up 2008-10-17 20:40 lower address at the top of the dleaf box 2008-10-17 20:41 the block directory (dictionary) grows down towards lower addresses, the extent table groups up towards higher addresses 2008-10-17 20:41 hmm.. ok 2008-10-17 20:42 I have to get all those patches merged 2008-10-17 20:42 and get my post out 2008-10-17 20:42 sigh 2008-10-17 20:43 hardly time to fit in any sk8photography 2008-10-17 20:43 skateography 2008-10-17 20:46 :) 2008-10-17 20:47 flips: http://shapor.com/dp/ 2008-10-17 20:47 thats bashgal 2008-10-17 20:47 1. put pics in directory 2008-10-17 20:47 2. cd to directory 2008-10-17 20:47 3. run bashgal script :) 2008-10-17 20:47 autorotates and thumbnail generation 2008-10-17 20:47 I better get summa dat 2008-10-17 20:49 shapor, whos that skating guy? 2008-10-17 20:49 http://shapor.com/dp/.600/.html/carousel.jpg.html <- f2.8, nailed it 2008-10-17 20:50 that's just some sk8er dude 2008-10-17 20:50 I just asked for volunteers to do a couple tricks for the camera 2008-10-17 20:51 lol 2008-10-17 20:53 http://tux3.org/wheee.jpg 2008-10-17 20:53 this one needed a little faster exposure 2008-10-17 20:54 rather nice motion blur on the background 2008-10-17 20:55 guy on the bike looks like he's doing something unnatural with his nose though 2008-10-17 20:55 kay 2008-10-17 20:55 I'd better try bashgal 2008-10-17 20:55 optimize my pathetic bandwidth 2008-10-17 20:55 and I don't really need to share my potential postcard resolution with the entire internet 2008-10-17 20:56 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-17 20:56 hello 2008-10-17 20:56 Hi! 2008-10-17 20:56 hi 2008-10-17 20:57 flips: that shots a bit blurry 2008-10-17 20:57 contrast seems to bleed on that camera 2008-10-17 20:57 high contrast* 2008-10-17 20:57 that's what I was saying 2008-10-17 20:57 where? where? 2008-10-17 20:57 boardwalk where venice meets santa monica 2008-10-17 20:57 http://shapor.com/dp/.600/.html/carousel.jpg.html 2008-10-17 20:57 daniels photos from tonight 2008-10-17 20:58 hmm.. what's with the jpeg artifacts? :P 2008-10-17 20:58 what I needed was a big honking flash 2008-10-17 20:58 I like it :P 2008-10-17 20:58 why?? 2008-10-17 20:59 to get the fast closeup action 2008-10-17 20:59 I would have make the bottom part even darker ;-) 2008-10-17 20:59 no, I mean a big flash :) 2008-10-17 20:59 http://shapor.com/dp/.600/.html/thepier.jpg.html 2008-10-17 21:00 I think the pictures would look better with some more saturation to the colors ;-) 2008-10-17 21:00 I'll gimp them ;) 2008-10-17 21:00 unforunately I didn't get raws for those 2008-10-17 21:00 turned off raw for the sk8t pics 2008-10-17 21:00 oh... 2008-10-17 21:01 the raw would have been handy here :P 2008-10-17 21:01 yo have more space in the buffer? 2008-10-17 21:01 let's see which ones got it 2008-10-17 21:01 yo = to 2008-10-17 21:01 ya, turned it off right at the beginning of the shoot 2008-10-17 21:02 next time will remember to turn it back on for the senery 2008-10-17 21:02 pretty nice jpgs though 2008-10-17 21:02 flips: nothing a decent p&s couldnt do ;) 2008-10-17 21:02 I don't like the jpg artifacts... :| 2008-10-17 21:02 show me :) 2008-10-17 21:02 could you increase the quality when the make the thumbs? :P 2008-10-17 21:03 http://tux3.org/carousel.jpg <- this one has saturation out the yinyang 2008-10-17 21:04 lookit those highlights 2008-10-17 21:04 almost not like digital 2008-10-17 21:05 though I reall want a 12 bit dac on my next camera 2008-10-17 21:05 http://www.flickr.com/cameras/ricoh/gr_digital/ 2008-10-17 21:06 flips: can you share the original from carousel? :P 2008-10-17 21:06 ACTION doesn't trust cameras with square holes at the front 2008-10-17 21:06 square holes at the front? 2008-10-17 21:06 that's the original 2008-10-17 21:06 aaa... ok :D 2008-10-17 21:06 check it out at one to one 2008-10-17 21:07 let me play a little with it :P 2008-10-17 21:08 ACTION feels an unsharp mask coming 2008-10-17 21:08 nooo :P 2008-10-17 21:08 :) 2008-10-17 21:09 I'd like my next camera to have 1/4 the noise at that light level too 2008-10-17 21:09 same as wanting a 12 bit adc 2008-10-17 21:09 I like noise ;-) 2008-10-17 21:09 I like being able to turn it off 2008-10-17 21:09 ACTION likes signal 2008-10-17 21:09 noise is data too 2008-10-17 21:10 you slept through your information theory class 2008-10-17 21:10 :P 2008-10-17 21:10 http://razvan.musaloiu.com/shoebox/ till I play with some photoshop... 2008-10-17 21:11 noise is to signal as shit is to strawberries 2008-10-17 21:11 RazvanM: some nice photos 2008-10-17 21:11 shapor: thanks... :P 2008-10-17 21:12 art :) 2008-10-17 21:17 why current kernel (junkfs) doesn't use page/buffer cache? 2008-10-17 21:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-17 21:19 hirofumi, just because we haven't added that yet 2008-10-17 21:20 we going to use those normal way? 2008-10-17 21:20 mostly 2008-10-17 21:20 i see. 2008-10-17 21:20 but we will use those bio transfer functions directly to update buffer pages instead of using the block library I think 2008-10-17 21:21 and we're going to do some pretty fancy things with the buffers in page cache to avoid blocking 2008-10-17 21:21 writing about that now 2008-10-17 21:21 i see 2008-10-17 21:22 and it manage locking of buffers for forward logging? 2008-10-17 21:30 it will lock buffers, but not for forward logging 2008-10-17 21:30 manages not overwriting buffers that haven't been written out yet 2008-10-17 21:30 by removing them from the buffer cache before overwriting 2008-10-17 21:32 http://farm4.static.flickr.com/3011/2951108022_9cc58e9464_b.jpg 2008-10-17 21:32 flips: perhaps not the way you wanted :P 2008-10-17 21:49 um.. ah, COW? well, it seems I have to wait next doc. thanks. 2008-10-17 22:34 razvanm :) 2008-10-17 22:34 hirofumi, right 2008-10-17 22:34 mooo 2008-10-17 22:37 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-17 23:19 ACTION is (re)watching Chunking Express... 2008-10-18 00:00 have you seen fallen angels? 2008-10-18 00:15 yup 2008-10-18 00:15 (just finished the Chinking Express) 2008-10-18 00:16 just saw both recently myself 2008-10-18 00:16 I also liked Falledn Angels :P 2008-10-18 00:16 :D 2008-10-18 00:16 I think it's his best 2008-10-18 00:16 how about the Ashes of Time? :D 2008-10-18 00:16 or maybe I just like vinyl skirts 2008-10-18 00:16 ashes was funny 2008-10-18 00:17 the first one I watch was In the Mood for Love... 2008-10-18 00:17 put me to sleep 2008-10-18 00:17 :P 2008-10-18 00:17 I watched Ashed of Time several times to understand something from it ;-) 2008-10-18 00:17 you can only take so many hours of chungsams 2008-10-18 00:17 I also like it though ;-) 2008-10-18 00:18 chungsams ?!? 2008-10-18 00:18 chinese dress 2008-10-18 00:18 (I also had to watch a bunch of time In The Mood for Love to figure out who is who ;-)) 2008-10-18 00:18 :D 2008-10-18 00:22 http://images.google.com/images?hl=en&q=Cheongsam 2008-10-18 00:22 nice... 2008-10-18 00:22 somehow related: http://razvan.musaloiu.com/2006/12/03/zhou-yu 2008-10-18 00:22 they've been working on it for thousands of years I understand 2008-10-18 00:23 they still love amazing :D 2008-10-18 00:23 (to me at least :P) 2008-10-18 00:24 have you seen chinese ghost story? 2008-10-18 00:24 don't think so? 2008-10-18 00:24 (they should have more pictures here: http://en.wikipedia.org/wiki/Cheongsam ) 2008-10-18 00:25 http://en.wikipedia.org/wiki/A_Chinese_Ghost_Story 2008-10-18 00:25 if you have a thing for chinese you need to see it 2008-10-18 00:25 http://www.imdb.com/title/tt0093978/ 2008-10-18 00:26 I never heard of it :( 2008-10-18 00:26 cult hit 2008-10-18 00:26 87... I was in 2nd grade then :P 2008-10-18 00:28 awesome... they have at the library 2008-10-18 00:29 (I canceled my netflix subscription a few weeks ago) 2008-10-18 00:29 figured you saw everything worth seeing? 2008-10-18 00:29 nope... I didn't have time to see any of the movies for more than one year :D 2008-10-18 00:30 it was a relief 2008-10-18 00:30 :D 2008-10-18 00:31 I had to watch again Chunking Express though... it fades in my memory and I need a refresh from time to time 2008-10-18 00:31 some sort of timeout ;-) 2008-10-18 00:32 now I'll head to bed 2008-10-18 00:32 good night 2008-10-18 00:32 I fixed one last leak for romfs ;-) 2008-10-18 00:33 tomorrow I'll try minix :P 2008-10-18 00:33 and time to watch a netflix movie with my wife 2008-10-18 00:33 doctor strangelove 2008-10-18 00:33 aaaaaaaaaaaa 2008-10-18 00:33 very-very nice! :D 2008-10-18 00:33 extra nice ;-) 2008-10-18 00:33 enjoy :D 2008-10-18 00:33 :) 2008-10-18 00:33 see you 2008-10-18 00:33 bye 2008-10-18 01:04 -!- Bobby_(~Bobby@122.162.70.80) has joined #tux3 2008-10-18 01:04 hey all 2008-10-18 02:16 -!- Bobby_(~Bobby@122.162.74.234) has joined #tux3 2008-10-18 02:20 -!- Bobby_(~Bobby@122.162.72.218) has joined #tux3 2008-10-18 02:22 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-18 02:23 -!- prani(~bobby@122.162.72.218) has joined #tux3 2008-10-18 03:33 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-18 03:34 -!- Bobby_(~Bobby@122.162.72.218) has joined #tux3 2008-10-18 03:46 -!- prani(~bobby@122.162.72.218) has joined #tux3 2008-10-18 06:57 -!- pgquiles(~pgquiles@252.Red-83-41-113.dynamicIP.rima-tde.net) has joined #tux3 2008-10-18 07:46 -!- stargazr5(~gauravstt@59.95.27.129) has joined #tux3 2008-10-18 08:58 -!- stargazr5(~gauravstt@59.95.22.5) has joined #tux3 2008-10-18 09:24 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-18 09:37 -!- Bobby_(~Bobby@122.162.72.218) has joined #tux3 2008-10-18 10:23 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-18 11:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-18 12:01 -!- bobby(~bobby@122.162.68.224) has joined #tux3 2008-10-18 12:01 -!- Bobby_(~Bobby@122.162.68.224) has joined #tux3 2008-10-18 12:01 hey all 2008-10-18 12:20 flips, shapor dleaf pics pleaseee 2008-10-18 12:20 i think i got a pretty good idea from hirofumi's diagram 2008-10-18 12:20 it shows the dleaf structure there... 2008-10-18 12:23 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-18 14:20 ACTION is experimenting with mercurial... 2008-10-18 14:38 mercurial rocks 2008-10-18 14:44 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-18 15:08 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-18 16:17 ACTION is trying to revive an old G3 with Panther... 2008-10-18 16:26 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-18 16:37 http://tux3.org/images/.600/.html/boardwalk.jpg.html <- gallery courtesy of shapor 2008-10-18 16:38 probably should move this to phunq 2008-10-18 16:38 http://phunq.net/images/ <- there we go 2008-10-18 16:40 nice... can you take more pictures? 2008-10-18 16:40 you too :) 2008-10-18 16:40 nice art you made yesterday 2008-10-18 16:40 very postcard 2008-10-18 16:41 http://phunq.net/images/sunset/ <- new home 2008-10-18 16:42 I like the 9x16 format ;-) 2008-10-18 16:42 entirely shapor's doing 2008-10-18 16:42 well and the camera 2008-10-18 16:42 no crops or retouching 2008-10-18 16:43 http://phunq.net/images/sunset/.1024/.html/carousel.jpg.html <- what do you call the darkening effect at the corners 2008-10-18 16:43 oh... so shapor took all the pictures? :D 2008-10-18 16:43 heh, did all the gallery processing 2008-10-18 16:43 http://en.wikipedia.org/wiki/Vignetting 2008-10-18 16:43 :D 2008-10-18 16:43 right 2008-10-18 16:44 especially prominent at f2.8 2008-10-18 16:44 what camera was it? 2008-10-18 16:44 wide angle 2008-10-18 16:44 20d 2008-10-18 16:44 aaa 2008-10-18 16:44 oldie but goodie 2008-10-18 16:45 can you take more? :D 2008-10-18 16:45 you are skating every day, right? 2008-10-18 16:46 right 2008-10-18 16:46 tonight without the camera backpack though 2008-10-18 16:46 stupid q: what kind of skating? 2008-10-18 16:46 roller ;) 2008-10-18 16:46 :D 2008-10-18 16:46 street I guess is the word 2008-10-18 16:46 why a backpack? 2008-10-18 16:47 camera backpack 2008-10-18 16:47 you don't want a holster swinging around 2008-10-18 16:47 I would just tied the camera to my hand :P 2008-10-18 16:47 when slaloming down the hill 2008-10-18 16:47 3 kilos of camera 2008-10-18 16:47 uh... 2008-10-18 16:47 aaa... the lens! :D 2008-10-18 16:47 right 2008-10-18 16:47 do you have a cheap 50mm? :P 2008-10-18 16:47 the f1.8 type ;-) 2008-10-18 16:47 let me see, is it really that much 2008-10-18 16:47 I do 2008-10-18 16:48 takes good pics 2008-10-18 16:48 http://www.dpreview.com/reviews/specs/Canon/canon_eos20d.asp 2008-10-18 16:48 770g 2008-10-18 16:48 in some situations 2008-10-18 16:48 and the lens is 1.something kilos 2008-10-18 16:48 so 2 kilos 2008-10-18 16:48 the 1.8 would e also nice ;-) 2008-10-18 16:48 I'll get a couple primes 2008-10-18 16:48 btw, do you drive there? 2008-10-18 16:49 skate 2008-10-18 16:49 or is that close to where you live? 2008-10-18 16:49 along ocean 2008-10-18 16:49 nice :_) 2008-10-18 16:49 rich people's sidewalk 2008-10-18 16:49 then down a fairly respectable hill 2008-10-18 16:49 so how long is this? 2008-10-18 16:49 several km? 2008-10-18 16:50 (I left my roller blades in Ro :|) 2008-10-18 16:50 right, about 10 km/day 2008-10-18 16:50 when I don't skate with tim that is 2008-10-18 16:50 with tim is more? :P 2008-10-18 16:51 (something funny, you sk8 hour is around 8 here :D) 2008-10-18 16:51 heh 2008-10-18 16:53 http://phunq.net/images/sunset/.1024/.html/carousel.jpg.html <- on the right side of that pic is a ramp I slalom on 2008-10-18 16:53 there are a bunch of posts at the bottom, so it has to be controlled 2008-10-18 16:53 shapor would probably just tuck and aim for a gap between the posts 2008-10-18 16:53 :P 2008-10-18 16:53 narrowly missing a couple baby carriages at the bottom 2008-10-18 16:54 tim does airials and spins all the way down 2008-10-18 16:54 ?? 2008-10-18 16:54 narrlowly missing the posts at the bottom, but not by chance as with shapor ;) 2008-10-18 16:55 aerial -> jump up in the air and do a trick 2008-10-18 16:55 like land backwards 2008-10-18 16:55 or spin 2008-10-18 16:55 oh... you guys are all good :P 2008-10-18 16:55 shapor and I are relative beginners 2008-10-18 16:55 tim is hardcore 2008-10-18 16:56 any pictures of them? 2008-10-18 16:56 there were two guys in your pictures... 2008-10-18 16:56 random dudes 2008-10-18 16:56 we'll get pics one day 2008-10-18 16:56 tim took a couple of me with his iphone 2008-10-18 16:56 noisy 2008-10-18 16:57 and timing is almost impossible with the shutter lag 2008-10-18 16:57 I'll bring down the canon and hand it to him one day 2008-10-18 16:57 let me guess, it will contain 4 people and the letters on their shirts will spell: t u x 3 :D 2008-10-18 16:57 heh 2008-10-18 16:57 about time to get rolling 2008-10-18 16:58 enjoy 2008-10-18 16:58 the G3 is almost ready... 2008-10-18 16:58 g3? 2008-10-18 16:59 PPC G3 2008-10-18 16:59 there was an old one laying around 2008-10-18 16:59 266Mhz 2008-10-18 16:59 I just put 2x128 in it 2008-10-18 16:59 ah 2008-10-18 16:59 os/x ? 2008-10-18 16:59 yup 2008-10-18 16:59 10.3 2008-10-18 17:00 I didn't have any luck with any of the bsd 2008-10-18 17:00 somebody else tried a linux but there was no X 2008-10-18 17:01 what will you use it for? 2008-10-18 17:01 fuse? 2008-10-18 17:01 I wanted to make it work when I didn't have a PPC around 2008-10-18 17:02 the tricks that I did to compile the linux modules on my mac depends on the fact that is intel 2008-10-18 17:02 I wanted to get them to compile on PCC also 2008-10-18 17:03 Ral has a PPC but I didn't want to work on her machine 2008-10-18 17:03 she is using wifi sometime :P 2008-10-18 17:04 hardcore 2008-10-18 17:53 wow... for minix I have 50 more functions to add :| 2008-10-18 18:20 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-18 18:27 block_sync_page looks can be nothing... 1 down, 49 to go 2008-10-18 18:36 find_first_zero_bit... 2 down, 48 to go 2008-10-18 18:48 clear_inode... 3 down, 47 to go 2008-10-18 19:08 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-18 19:57 kmap_atomic... 4 down, 46 to go 2008-10-18 20:01 implementing the linux vfs on minix? 2008-10-18 20:01 nope 2008-10-18 20:01 no, os/x I suppose 2008-10-18 20:01 compiling minix fs in mac os :D 2008-10-18 20:02 what is it page_follow_link_light and page_put? 2008-10-18 20:02 some old stuff? 2008-10-18 20:02 there will be a bunch of support functions not having much to do with modern filesystems 2008-10-18 20:02 hmm... it is in ext4 though 2008-10-18 20:02 follow__light is a new one for me 2008-10-18 20:03 some optimization of symlink I suppose 2008-10-18 20:03 and you'd have to read path_walk and friends 2008-10-18 20:03 I suppose we should do that one day 2008-10-18 20:03 :D 2008-10-18 20:03 I've never completely analyzed it myself, it's a royal mess 2008-10-18 20:03 another one 2008-10-18 20:04 the cool thing is I can fake some stuff if they will never be called :P 2008-10-18 20:05 just compile them all as error stubs, execute the mount and fix the first thing that breaks 2008-10-18 20:05 that's the approach 2008-10-18 20:05 I spent some time to understand them though 2008-10-18 20:05 been there ;) 2008-10-18 20:05 at least a little :D 2008-10-18 20:05 hehe... 2008-10-18 20:06 in what context? 2008-10-18 20:06 ported an app from an embedded controller to msdos 2008-10-18 20:06 uuu... 2008-10-18 20:06 had to emulate the graphics controller 2008-10-18 20:07 dozens of features, some of them bizarre 2008-10-18 20:07 different graphics memory layout, line acceleration... 2008-10-18 20:07 everything to make life miserable 2008-10-18 20:07 did it work? :D 2008-10-18 20:07 it took over the world 2008-10-18 20:07 did it worth the effort? 2008-10-18 20:08 canon cameras are running msdos... 2008-10-18 20:08 http://www.herkules-group.com/ >- bailed these guys out of a dead end and they now monopolize the world's roll grinder industry 2008-10-18 20:09 roll grinder... serious stuff! :D 2008-10-18 20:09 big stuff... 2008-10-18 20:09 (I'm looking at the pictures) 2008-10-18 20:09 chances are you used some plastic wrap in the last week made by a roll ground by my software 2008-10-18 20:10 and are driving around in a car wearing sheet metal made by a roll ground by my software ;) 2008-10-18 20:10 :D 2008-10-18 20:11 and few home for christmas in a plane with superstructure made by... etc etc 2008-10-18 20:11 so far I only know a bug that affects all the mac os x I tried so far :P 2008-10-18 20:12 bugs in that software tend to make large explosions and/or cut people in half 2008-10-18 20:13 "crash and burn" literally 2008-10-18 20:13 how big was your code? 2008-10-18 20:13 http://www.mohansteels.com/rollmill3.jpg 2008-10-18 20:13 bug free? 2008-10-18 20:13 pretty big 2008-10-18 20:13 not bug free, but reliable 2008-10-18 20:14 far more than the original 2008-10-18 20:14 I never have the chance to walk in such a place 2008-10-18 20:14 did some of the debugging live in places like in the picture, wearing a hard hat 2008-10-18 20:14 downtime cost $90,000 hour, so upgrades had to be fast 2008-10-18 20:17 http://www.sumitomometals.co.jp/e/osakasteelworks/vc/images/vc3-3.jpg 2008-10-18 20:18 :D 2008-10-18 20:18 how long were the debug sessions? 2008-10-18 20:18 days or weeks 2008-10-18 20:18 had to be careful 2008-10-18 20:19 finding other people's bugs pretty much 2008-10-18 20:19 how long have you done this? 2008-10-18 20:19 usually by replacing masses of code with stuff a fraction of the size 2008-10-18 20:19 did that for two or three years 2008-10-18 20:19 did you like it? :P 2008-10-18 20:20 would not have given up the experience 2008-10-18 20:20 rather different than the rest of my life ;) 2008-10-18 20:20 whole nuther side of the world 2008-10-18 20:20 don't need any more of that 2008-10-18 20:20 this was before Google? 2008-10-18 20:21 or much earlier? 2008-10-18 20:21 way before 2008-10-18 20:21 been a linux hacker since 2008-10-18 20:22 (nameidata... another thing I don't know about...) 2008-10-18 20:22 it could have been something else beside linux? 2008-10-18 20:22 bsd perhaps? 2008-10-18 20:22 mac os? :D 2008-10-18 20:22 http://www.industry.siemens.com/metals/EN/solutions/autom_siflat.htm <- you don't want to step on one of these when it's moving 2008-10-18 20:22 not much to prevent that either 2008-10-18 20:23 should have been linux 2008-10-18 20:23 the reason I left is because the company decided to do the next generation in windows instead of linux 2008-10-18 20:23 I didn't want to stick around to clean up the body parts 2008-10-18 20:23 they're probably using linux by now 2008-10-18 20:23 and they did do in windows?!? 2008-10-18 20:24 aha... :D 2008-10-18 20:24 they did, just the user interface 2008-10-18 20:24 and used off the shelf controllers for the back end 2008-10-18 20:24 still 2008-10-18 20:25 the off the shelf controllers are almost certainly running linux now 2008-10-18 20:25 then they were probalby wind river or similar 2008-10-18 20:25 VxWorks? 2008-10-18 20:25 probably 2008-10-18 20:26 details weren't interesting to me 2008-10-18 20:26 in linux, you do all the stuff the controller is doing and throw away the controller 2008-10-18 20:26 control the motors directly, throw away the motor controller too 2008-10-18 20:26 :D 2008-10-18 20:27 you still need something to get to those motors 2008-10-18 20:27 http://planet.wwu.edu/fall04/images/carmeltedscrap.jpg 2008-10-18 20:27 wish I had my camera then 2008-10-18 20:27 exactly! :D 2008-10-18 20:27 these shots don't give have the sense of what it's like 2008-10-18 20:27 it's like science fiction 2008-10-18 20:27 :-) 2008-10-18 20:29 http://www.hebig.org/blogs/archives/main/IMG_7068_4.jpg <- that's more like it 2008-10-18 20:29 crappy photo though 2008-10-18 20:30 I wonder if they let people take picture now... 2008-10-18 20:30 sure 2008-10-18 20:30 do they? 2008-10-18 20:30 no secrets 2008-10-18 20:30 they use the photos to get business 2008-10-18 20:31 :D 2008-10-18 20:31 you by equipment by the pound in that industry 2008-10-18 20:31 there's only one source for most of it 2008-10-18 20:31 where are this things? can you ask to go and visit? 2008-10-18 20:31 you can 2008-10-18 20:31 they get so few visitors you'd get the royal treatment 2008-10-18 20:32 but where are these? 2008-10-18 20:32 usually in the middle of nowhere where land is cheap and the cows don't complain about the noise and smoke 2008-10-18 20:32 ouch... 2008-10-18 20:32 you're in chicago, right? probably don't have far to go to get to the rust belt 2008-10-18 20:32 touring america and taking pics in some of these places would be awesome :D 2008-10-18 20:33 I'm in Baltimore :P 2008-10-18 20:33 even better I think 2008-10-18 20:33 I don't jave a car though :| 2008-10-18 20:33 probably got a rolling mill or two on the same block ;) 2008-10-18 20:33 school is in the walking distance and so is the supermarket :P 2008-10-18 20:34 ok, maryland, shows how much american geography I know 2008-10-18 20:34 the mills all moved west long time ago 2008-10-18 20:34 the JHU campus is inside the city 2008-10-18 20:35 how's the weather right now? 2008-10-18 20:35 pretty warm 2008-10-18 20:36 I go to school in short sleeve 2008-10-18 20:36 looks like a nice situation 2008-10-18 20:36 I put a jacket when I come though 2008-10-18 20:36 right now the max is around 20-22 and the min about 10 2008-10-18 20:36 I prefer much warmer weather :P 2008-10-18 20:36 Singapore was heaven :D 2008-10-18 20:37 http://www.baltimoresun.com/media/photo/2008-03/37043371.jpg <- close to you 2008-10-18 20:38 how did you find that? :D 2008-10-18 20:38 creative googling 2008-10-18 20:39 maryland steel factories? :D 2008-10-18 20:39 maps.google.com, go to baltimore, type in steel mill, look for one close to you, type in the address, go to images 2008-10-18 20:39 let's see... 2008-10-18 20:40 http://maps.google.com/maps?f=q&hl=en&geocode=&q=%22steel+mill%22+maryland&ie=UTF8&filter=0&z=7 2008-10-18 20:41 that would have been me, wearing a blue hard hat and birkenstocks 2008-10-18 20:41 perhaps I should mention the terrible state of the public transportation in baltimore :P 2008-10-18 20:41 birkenstocks? 2008-10-18 20:42 http://en.wikipedia.org/wiki/Birkenstocks 2008-10-18 20:42 german sandals 2008-10-18 20:42 still have the same ones 2008-10-18 20:42 they last forever 2008-10-18 20:42 can you wear something like this in a steel factory?? 2008-10-18 20:49 they care more about your head than your feet 2008-10-18 20:49 save the important parts :D 2008-10-18 20:50 just be careful not to step on anything that's glowing 2008-10-18 20:51 43 calls to go... 2008-10-18 20:51 I should go home :| 2008-10-18 20:53 funny: the fs is exactly the last chapter in operating systems, design and implementation :P 2008-10-18 20:58 uses every aspect of the fs just about 2008-10-18 20:58 with the exception of some multimedia 2008-10-18 21:21 going home... 2008-10-18 21:21 have a nice evening! 2008-10-18 21:21 thanks for chat :P 2008-10-18 22:07 folks 2008-10-18 22:09 hi 2008-10-18 22:15 how's it going ? 2008-10-18 22:16 finally got some good results out of my schedule instrumentation and I found that the cpu_idle thread is a significant source of lock contention against the rq locks which is surprising 2008-10-18 22:16 I haven't posted my patch nor the result to lkml yet 2008-10-18 22:24 there might be some logic that too aggressive to try and idle a processor under various thread loads and it might be another aspect that's putting a lot of pressure against those rq locks 2008-10-18 22:24 ACTION thinks about file system parallelism 2008-10-18 22:42 -!- prani(~Bobby@122.163.48.97) has joined #tux3 2008-10-18 22:43 -!- prani(~Bobby@122.163.48.97) has joined #tux3 2008-10-18 23:40 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-18 23:53 -!- Bobby_(~Bobby@122.163.48.97) has joined #tux3 2008-10-19 00:04 -!- bobby(~bobby@122.163.48.97) has joined #tux3 2008-10-19 00:23 -!- bobby(~bobby@122.163.48.97) has joined #tux3 2008-10-19 03:07 -!- paola(~paola@ppp-139-17.20-151.libero.it) has joined #tux3 2008-10-19 04:35 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-19 09:47 -!- Bobby_(~Bobby@122.162.71.160) has joined #tux3 2008-10-19 09:55 -!- bobby(~bobby@122.162.71.160) has joined #tux3 2008-10-19 10:06 -!- stargazr5(~gauravstt@59.95.14.136) has joined #tux3 2008-10-19 10:18 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-19 11:03 -!- pgquiles(~pgquiles@71.Red-79-154-137.staticIP.rima-tde.net) has joined #tux3 2008-10-19 11:20 -!- pgquiles(~pgquiles@71.Red-79-154-137.staticIP.rima-tde.net) has joined #tux3 2008-10-19 12:07 -!- Bobby_(~Bobby@122.162.69.167) has joined #tux3 2008-10-19 12:07 hey akk 2008-10-19 12:07 all* 2008-10-19 12:07 anyone ever heard of 1Gbps data transfer? 2008-10-19 12:09 -!- zbrown(~rufius@208.64.37.45) has left #tux3 2008-10-19 12:41 look up agami systems wikipedia 2008-10-19 12:45 -!- Bobby_(~Bobby@122.162.69.103) has joined #tux3 2008-10-19 13:54 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-19 14:14 hmm... I think I reach the point where I cannot avoid buffer heads anymore... 2008-10-19 15:05 doctor strangelove was great 2008-10-19 15:05 peter sellers is beyond brilliant 2008-10-19 15:07 that's the guy that played a few characters there? 2008-10-19 15:08 that's him 2008-10-19 15:08 I didn't know when I watch it :D 2008-10-19 15:08 including the illustrious doctor 2008-10-19 15:09 I also liked the texan pilot :P 2008-10-19 15:09 (which he was suppose to also play) 2008-10-19 15:09 slim pickens 2008-10-19 15:10 apparently not acting 2008-10-19 15:10 that's how he really was 2008-10-19 15:10 yeah! :D 2008-10-19 15:10 I read about that 2008-10-19 15:10 he came in boots and with the hat from the first day :P 2008-10-19 15:11 it's amazing how sellers could change his face to play the president 2008-10-19 15:11 didn't recognize him 2008-10-19 15:11 Kubrick biographer John Baxter further explains in the documentary Inside the Making of Dr. Strangelove: 2008-10-19 15:11 As it turns out, Slim Pickens had never left the United States. He had to hurry and get his first passport. He arrived on the set, and somebody said, "Gosh, he's arrived in costume!," not realizing that that's how he always dressed... with the cowboy hat and the fringed jacket and the cowboy boots?and that he wasn't putting on the character?that's the way he talked. 2008-10-19 15:11 (from wikipedia :D) 2008-10-19 15:12 seems to be rated about number 5 movie of all time 2008-10-19 15:12 on lots of lists 2008-10-19 15:13 President Merkin Muffley: Gentlemen, you can't fight in here! This is the War Room. 2008-10-19 15:13 I saw it when it first came out, I guess my dad didn't know it wasn't for kids 2008-10-19 15:13 wow! :D 2008-10-19 15:13 all I remembered from it was the guy sitting on the bomb waving his hat 2008-10-19 15:13 how old were you? 2008-10-19 15:13 8 2008-10-19 15:13 :D 2008-10-19 15:14 actually, my 4 year old could relate to the sitting on the bomb scene 2008-10-19 15:15 she need to know why he wanted to sit on the bomb 2008-10-19 15:15 the part about the bodily fluids was hilarious... 2008-10-19 15:15 answer: because he'd stupid 2008-10-19 15:15 :-) 2008-10-19 15:16 reading out the phone call... serious right up until the last line 2008-10-19 15:17 Director Kubrick tricked Scott into playing the role of Gen. Turgidson far more ridiculously than Scott was comfortable with doing. Kubrick talked Scott into doing "over the top" practice takes, which Kubrick told Scott would never be used, as a way to warm up for the "real" takes. Subsequently, Kubrick used these takes in the final film, causing Scott to swear never to work with Kubrick again. 2008-10-19 15:18 (also from wikipedia) 2008-10-19 15:18 considered scott's breakout role 2008-10-19 15:18 so I suppose it was just a quick swear 2008-10-19 15:19 kubrick gets my vote for greatest director of all time 2008-10-19 15:20 I think the doctor strangelove is the only film I watch from him 2008-10-19 15:20 awesome song for the ending :P 2008-10-19 15:20 there's 2001 2008-10-19 15:20 barry lyndon 2008-10-19 15:21 the shining 2008-10-19 15:21 clockwork orange 2008-10-19 15:21 all must see 2008-10-19 15:21 aaaa 2001 2008-10-19 15:21 he's got at least 5 films in the top 100 2008-10-19 15:21 I have to see that 2008-10-19 15:25 apparently some of the terrain scenes in strangelove were recycled in the light show part of 2001 2008-10-19 15:25 optically/chemically alterered 2008-10-19 15:25 didn't know about that :D 2008-10-19 15:26 64, 68... 2008-10-19 15:26 4 years apart 2008-10-19 15:26 aaa... I watched Eyes Wide Shut :P 2008-10-19 15:27 I need to see it 2008-10-19 15:28 critics thought the pacing was slow... that's a good sign 2008-10-19 15:28 not with the kids around though :P 2008-10-19 15:28 it was slow... 2008-10-19 15:28 that's what they thought about barry lyndon, which is since recognized as one of the great films 2008-10-19 15:28 one of my favorites 2008-10-19 15:29 need to watch that 2008-10-19 15:29 cool... they also have it at the library ;-) 2008-10-19 15:30 checkout out but the next free time is not soon 2008-10-19 15:30 you've got plenty of time ;) 2008-10-19 15:31 "after five or ten years came the realization that 2001 or Barry Lyndon or The Shining was like nothing else before or since" -- Scorsese 2008-10-19 15:31 exactly 2008-10-19 15:31 :D 2008-10-19 15:32 ah, full meta jacket, another kubrick 2008-10-19 15:33 metal even 2008-10-19 15:33 metajacket... sounces like sequel to men in black 2008-10-19 15:34 btw, did you like Red Thin Line? 2008-10-19 15:35 didn't see it 2008-10-19 15:35 http://www.imdb.com/title/tt0120863/ 2008-10-19 15:35 1998? 2008-10-19 15:35 I like it a lot 2008-10-19 15:36 http://www.rottentomatoes.com/m/1084146-thin_red_line/ 2008-10-19 15:36 yup 2008-10-19 15:36 it's from '98 2008-10-19 15:36 I'll check it out 2008-10-19 15:37 http://www.youtube.com/watch?v=Gm6ZgOBlzII 2008-10-19 15:37 funny how kubrick seems to have intentionally made one of each of the major film genre's, almost 2008-10-19 15:37 no western 2008-10-19 15:37 I'm a sucker for voiceovers ;-) 2008-10-19 15:40 just watched the trailer... 2008-10-19 15:40 I had to watch it again 2008-10-19 15:40 thin red line? 2008-10-19 15:41 yup 2008-10-19 15:41 ok, it's on my list 2008-10-19 15:41 let me know what do you think after you watch it :D 2008-10-19 15:44 -!- paola(~paola@ppp-139-17.20-151.libero.it) has left #tux3 2008-10-19 15:46 #include "itree_common.c" :D 2008-10-19 15:46 in itree_v1.c from minix 2008-10-19 15:47 :) 2008-10-19 15:47 so there's some precedent... from linus maby even 2008-10-19 15:48 source includes are useful and obvious technique, it's strange how most c coders consider it somehow dirty 2008-10-19 15:48 * Copyright (C) 1991, 1992 Linus Torvalds 2008-10-19 15:48 * 2008-10-19 15:48 * Copyright (C) 1996 Gertjan van Wingerde (gertjan@cs.vu.nl) 2008-10-19 15:48 * Minix V2 fs support. 2008-10-19 15:49 in my case I didn't expect to find it in the middle of the itree_v1.c :D 2008-10-19 15:49 I was tracking a static function in itree_common.c 2008-10-19 15:49 and it was not used in itree_common.c ;-) 2008-10-19 15:51 shapor, brilliant merge 2008-10-19 15:52 post your merge helper scripts on the list? 2008-10-19 16:06 trying to think of all the things that need to be done for the kernel port now 2008-10-19 16:07 so far the testing was done only using fuse? 2008-10-19 16:07 and that very light 2008-10-19 16:07 so light that I think it's been broken for a couple weeks 2008-10-19 16:07 since the extents merge 2008-10-19 16:07 so the more serious one was using userland programs? 2008-10-19 16:08 there hasn't been serious testing 2008-10-19 16:08 ack :D 2008-10-19 16:09 we're getting closer though 2008-10-19 16:10 as soon as I get over this atomic commit hump, and produce some code for it then that finishes the big hacking projects in user space for the time being 2008-10-19 16:10 well, except for making fuse actually work well 2008-10-19 16:10 coding the atomic commit will hopefully start tomorrow 2008-10-19 16:12 38 more calls... I'm slow... 2008-10-19 16:13 now how are we going to do the source code management for the kernel port 2008-10-19 16:13 hg? 2008-10-19 16:14 I'm thinking, we have a script that copies in the files from the userspace directory, maybe makes some slight changes to them, then applies a patch to produce the kernel code 2008-10-19 16:14 and we hack on the kernel code in-tree, with a git tree cloned from mainline 2008-10-19 16:15 so to update the above patch, we run the import script then do git-diff 2008-10-19 16:15 why not make the userland compatible with the kernel code? :P 2008-10-19 16:15 so that way we keep the user space and kernel code from diverging a lot for the time being 2008-10-19 16:15 (like I'm doing now ;-)) 2008-10-19 16:15 we will make it mostly compatible 2008-10-19 16:15 but there are obvious places where it can't be 2008-10-19 16:15 like be don't need buffer.c or diskio.c at all 2008-10-19 16:16 s/be/we/ 2008-10-19 16:16 bok... I'll make the code run on my vfs then :P 2008-10-19 16:16 the kernel code I mean 2008-10-19 16:16 good luck 2008-10-19 16:16 why not just hack on the real kernel? 2008-10-19 16:17 I like the userland 2008-10-19 16:17 I'll set up a uml tarball 2008-10-19 16:17 but I also like to keep the code untouched 2008-10-19 16:17 you can work completely in userland 2008-10-19 16:17 just not on macos 2008-10-19 16:17 well... I'm am working on macos ;) 2008-10-19 16:17 :D 2008-10-19 16:17 all good things come to an end ;) 2008-10-19 16:17 ACTION isn't sure whether that's actually true 2008-10-19 16:18 hmm... the market going up did :P 2008-10-19 16:18 it I'm able to run ext2 I should be able to run tux3, right? 2008-10-19 16:19 it = if 2008-10-19 16:21 possibly 2008-10-19 16:21 you need to emulate the bio interface 2008-10-19 16:21 should not be that hard 2008-10-19 16:21 I'll got to that in ext2 I guess... 2008-10-19 16:21 minix doesn't use it 2008-10-19 16:21 not really 2008-10-19 16:22 ext2 doesn't use bios directly 2008-10-19 16:22 last time I checked 2008-10-19 16:22 ext3? :D 2008-10-19 16:22 flips: not script, just some droppings in my .bash_history 2008-10-19 16:22 probably script worthy though 2008-10-19 16:22 shapor. scrape .bash_history ? 2008-10-19 16:22 yeah 2008-10-19 16:23 before it rides off into the sunset 2008-10-19 16:23 speaking of which 2008-10-19 16:23 copied it so it wouldn't get rotated out 2008-10-19 16:23 getting close to sk8 thirty 2008-10-19 16:23 i'm kinda beat just rode 120 miles 2008-10-19 16:23 next time we have clouds I'll bring the camera down to the beach again 2008-10-19 16:23 mostly on the edge of my tires ;) 2008-10-19 16:23 gotta clean the sensor 2008-10-19 16:23 heh 2008-10-19 16:24 gotta make sure you don't just wear the middles 2008-10-19 16:25 whoa, it's cool out, have to wear an extra layer 2008-10-19 16:25 winter is setting in in socal 2008-10-19 16:26 71F in my lab right now :( 2008-10-19 16:30 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-19 16:34 saw mongol the russian movie last night 2008-10-19 16:34 way excellent 2008-10-19 16:35 blows away any hollywood release in the last couple years imho 2008-10-19 16:35 http://www.imdb.com/title/tt0416044/ ? 2008-10-19 16:37 yes 2008-10-19 16:37 underrated on english speaking sites I think 2008-10-19 16:38 http://www.apple.com/trailers/picturehouse/mongol/ 2008-10-19 16:38 looks cool :D 2008-10-19 16:39 apparently mongolians didn't like it and russians loved it 2008-10-19 16:40 btw, have you watched: http://www.imdb.com/title/tt1032846/ ? 2008-10-19 16:41 no, never heard of it 2008-10-19 16:42 it's dark... 2008-10-19 16:42 looks like a must see 2008-10-19 16:42 one of the best Romanian movie 2008-10-19 16:43 hmm, genghis kahn was played by a japanese 2008-10-19 16:43 looks pretty mongolian, that has something to be with being conquered perhaps 2008-10-19 16:44 probably also has something to do with mongolians hating it 2008-10-19 16:44 :-) 2008-10-19 16:48 the military logic is to history reality as hollywood westerns are to actual cow pies 2008-10-19 17:03 rasvanm, could you email me the original of http://farm4.static.flickr.com/3011/2951108022_9cc58e9464_b.jpg ? 2008-10-19 17:03 the psd file? 2008-10-19 17:04 whatever it is 2008-10-19 17:04 I wonder if gimp can read that 2008-10-19 17:04 apparently yes 2008-10-19 17:05 it might, I only used some simple filters 2008-10-19 17:08 the psd is ~90MB and is here: http://cs.jhu.edu.edu/~razvanm/carousel.psd 2008-10-19 17:08 the full size jpg is here http://farm4.static.flickr.com/3011/2951108022_fd19ba9383_o.jpg 2008-10-19 17:09 I should have spend some time to fix the carousel fringes... 2008-10-19 17:09 razvanm, later tonight ;) 2008-10-19 17:10 :D 2008-10-19 17:10 I like the fringes 2008-10-19 17:10 are you going to take more? 2008-10-19 17:10 it's 8 ;-) 2008-10-19 17:10 gives it that misaligned lithograph look 2008-10-19 17:11 some (usually dark) nice pictures http://www.dianevarner.com/ 2008-10-19 17:11 90 mb is gross 2008-10-19 17:11 I wonder how above achieves that, xml? 2008-10-19 17:11 let me check 2008-10-19 17:12 -rw-r--r-- 1 razvanm users 54747319 Oct 18 00:27 carousel.psd.gz 2008-10-19 17:12 there is some xml inside in some parts 2008-10-19 17:13 http://cs.jhu.edu.edu/~razvanm/carousel.psd.gz 2008-10-19 17:14 90 mb from 7 MB raw is unconscionable 2008-10-19 17:14 why I don't like proprietary/evil 2008-10-19 17:14 even when shiny 2008-10-19 17:15 it's not rgb... I converted to lab colors :P 2008-10-19 17:15 and I also make a layer using one the channels 2008-10-19 17:15 and I also used some gradient on a mask or two 2008-10-19 17:15 and I think that is stored as bitmap :P 2008-10-19 17:16 isn't the _o what you need? 2008-10-19 17:18 _o ? 2008-10-19 17:19 RazvanM: the full size jpg is here http://farm4.static.flickr.com/3011/2951108022_fd19ba9383_o.jpg 2008-10-19 17:19 http://cs.jhu.edu.edu/~razvanm/carousel.psd.gz gives 404 to wget and top level url to firefox 2008-10-19 17:19 looks like a censor job ;) 2008-10-19 17:20 bandwidth police maybe 2008-10-19 17:20 sorry.. I put two .edu-s :D 2008-10-19 17:21 http://cs.jhu.edu/~razvanm/carousel.psd.gz 2008-10-19 17:21 heh 2008-10-19 17:22 now it works, right? 2008-10-19 17:35 18 more calls... 2008-10-19 17:57 14 more... 2008-10-19 18:06 9 more... 2008-10-19 18:13 5 more... 2008-10-19 18:23 1 more... 2008-10-19 18:33 0 :P 2008-10-19 18:34 Bus error ;-) 2008-10-19 18:57 time to go home 2008-10-19 20:16 razvanm went home on the bus 2008-10-19 20:33 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-19 21:10 -!- mint(~mint@71-90-82-221.dhcp.stpt.wi.charter.com) has joined #tux3 2008-10-19 21:10 sup 2008-10-19 21:10 going on bitches 2008-10-19 21:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-19 21:42 flips: reading postings from tso 2008-10-19 21:42 regarding btrfs, it's also /.-ed 2008-10-19 22:00 flips: I wouldn't be discouraged 2008-10-19 22:00 just keep at it 2008-10-19 22:58 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-19 23:00 need a wiki editor 2008-10-19 23:00 http://en.wikipedia.org/wiki/Versioning_file_system 2008-10-19 23:32 guess tim didn't notice the [edit] link 2008-10-19 23:32 hey flips 2008-10-19 23:32 i think tim_dimm wanted someone to edit that... 2008-10-19 23:32 anyway, we don't get to go in there until it actually versions imho 2008-10-19 23:32 as in a person 2008-10-19 23:32 :) 2008-10-19 23:33 ah, we should bring our own more up to date 2008-10-19 23:33 http://en.wikipedia.org/wiki/Tux3 2008-10-19 23:33 who in here is Mr. Phillips? 2008-10-19 23:33 wild guess? 2008-10-19 23:34 flips: why are we not using rbtree in our fs? 2008-10-19 23:34 you! 2008-10-19 23:34 ah 2008-10-19 23:34 yeah right click confirmed that 2008-10-19 23:37 flips, if you don't mind me asking, how old are you? 2008-10-19 23:38 ha I went to google-stalk you 2008-10-19 23:38 FelipeS: i had no luck doing that :) 2008-10-19 23:38 I typed daniel phillips and google's "autocomplete" feature associated you with linux 2008-10-19 23:39 well I just found out he has a degreen in music 2008-10-19 23:40 from the Univ of British Columbia 2008-10-19 23:40 which he obtained in 1975 2008-10-19 23:40 lets say 1975 - 20 2008-10-19 23:40 1955 ? 2008-10-19 23:42 :D 2008-10-19 23:42 exactly 2008-10-19 23:42 good sneaky work, we need you :) 2008-10-19 23:43 was that really you? 2008-10-19 23:43 nah we all just need Google 2008-10-19 23:45 flips: rbtree? 2008-10-19 23:46 rbtree? 2008-10-19 23:46 htree maybe 2008-10-19 23:46 hmm 2008-10-19 23:46 not sure what you're talking about 2008-10-19 23:46 ACTION really doesnt knw a htree :( 2008-10-19 23:46 hmm 2008-10-19 23:46 i see that u are using a btree in tux3 2008-10-19 23:47 several 2008-10-19 23:47 yeah... 2008-10-19 23:47 so is a htree or rbtree better than a plain btree? 2008-10-19 23:49 rbtree is not a btree 2008-10-19 23:49 ! 2008-10-19 23:49 and a plain btree is not great for directory indexing 2008-10-19 23:50 rbtree is not? 2008-10-19 23:50 it has an extra color bit, everything else i guess makes it a btree? 2008-10-19 23:51 an rb tree is a type of binary tree 2008-10-19 23:51 a btree is not a binary tree, it is an nary tree 2008-10-19 23:51 oh :) 2008-10-19 23:52 the "b" in btree most likely stands for "balanced", but nobody knows for sure 2008-10-19 23:52 hmm.. thats something i learnt now.. 2008-10-19 23:52 ACTION looking for htree 2008-10-20 01:41 -!- Kirantpatil(~kiran@122.167.206.199) has joined #tux3 2008-10-20 01:42 -!- Kirantpatil(~kiran@122.167.206.199) has left #tux3 2008-10-20 01:43 -!- macan(~chatzilla@159.226.41.129) has joined #tux3 2008-10-20 01:49 hey flips 2008-10-20 01:58 -!- stargazr5(~gauravstt@59.95.52.222) has joined #tux3 2008-10-20 03:08 threading in C++ sucks 2008-10-20 06:14 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-20 06:45 -!- marcin(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-10-20 07:02 -!- bobby(~bobby@nat-inn.mentorg.com) has joined #tux3 2008-10-20 07:26 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-20 08:49 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-20 09:14 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-20 10:27 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-20 10:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-20 10:45 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-20 11:53 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-20 13:42 ACTION is implementing read_cache_page... 2008-10-20 15:53 should I not try to upgrade my inkscape from sid? 2008-10-20 15:55 I have 0.46-1+b1 and the latest one in sid is... 0.46-2.1 2008-10-20 16:50 my god... inkscape is awesome! 2008-10-20 17:02 failed to install on etch 2008-10-20 17:03 last time I used it it was sodipodi and it was promising 2008-10-20 17:03 pardon me, it in installs and fails to run 2008-10-20 17:04 can't find library libgc.so.1 2008-10-20 17:05 but I have /usr/lib/libgc.so.1.0.2 2008-10-20 17:05 so I guess the library was installed incorrectly 2008-10-20 17:08 indeed that was the issue, fixed by reinstall :-/ 2008-10-20 17:08 http://cs.jhu.edu/~razvanm/get_sb.png 2008-10-20 17:08 what do you think? :P 2008-10-20 17:08 I think inkspace was forked from sodipodi long time 2008-10-20 17:08 nice arrows 2008-10-20 17:08 not sure what they mean 2008-10-20 17:09 the tip of the arrow is the return of the function 2008-10-20 17:09 it's incredibly stupid that svg graphics do not scale with ctrl +/- 2008-10-20 17:09 should I upload an svg? :D 2008-10-20 17:09 oh 2008-10-20 17:09 thought it was ;) 2008-10-20 17:09 sure 2008-10-20 17:09 try it 2008-10-20 17:10 for that matter, it's stupid that pngs and jpgs don't scale with ctrl +/- 2008-10-20 17:11 http://cs.jhu.edu/~razvanm/get_sb.svg 2008-10-20 17:11 displays fine 2008-10-20 17:11 hey flips 2008-10-20 17:11 I also have on paper the init_module part 2008-10-20 17:12 does the ctrl +/- works with svg? :D 2008-10-20 17:12 it doesn't in safari... 2008-10-20 17:12 ACTION is cleaning up around the house today and will work tonight 2008-10-20 17:13 it does in firefox :D 2008-10-20 17:13 neat 2008-10-20 17:13 btw, does the calls look right? :D 2008-10-20 17:13 for me in firefox it only scales the svg text 2008-10-20 17:13 which looks really stupid 2008-10-20 17:14 hm... scales fine in mine... 2008-10-20 17:14 iceweasel 3.0 2008-10-20 17:14 so the FS is on the left and VFS on the right 2008-10-20 17:14 what is the significance of arrow going to the right vs to the left? 2008-10-20 17:14 ok 2008-10-20 17:15 I should also add the init_module + kmem_cache_create + register_filesystem on top... 2008-10-20 17:15 let me do that... 2008-10-20 17:15 btw, it's close to sk8 o'clock :P 2008-10-20 17:16 where does _fill_super go? 2008-10-20 17:16 you mean that by just fill_super? 2008-10-20 17:16 it is 2008-10-20 17:16 leaving in 4 minutes 2008-10-20 17:17 the fill_super is called from the get_sb_dev by VFS 2008-10-20 17:17 right? 2008-10-20 17:20 the _fill_super, or ->fill_super 2008-10-20 17:20 ->fill_super :D 2008-10-20 17:20 or as I write it sometimes, ->fill_super -> _fill_super 2008-10-20 17:20 sk8 oclock 2008-10-20 17:21 enjoy 2008-10-20 17:21 ACTION straps on skates 2008-10-20 17:22 see you at the top of seaside? 2008-10-20 17:22 actually, the fill_super is passed as a parameter to get_sb_bdev 2008-10-20 17:22 probalby for your second run 2008-10-20 17:22 if you survive the first 2008-10-20 17:22 yup 2008-10-20 17:45 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-20 19:49 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-20 20:13 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-10-20 20:19 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-20 20:27 -!- less(~less@145.116.238.192) has joined #tux3 2008-10-20 22:02 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-20 22:55 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-21 01:09 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-21 01:28 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-21 04:20 -!- mlankhorst(~m@fw1.astro.rug.nl) has joined #tux3 2008-10-21 08:43 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-21 08:57 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-21 09:10 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-21 09:46 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-21 10:01 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 10:39 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-21 12:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-21 12:47 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 13:59 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 15:30 ACTION is reading Chapter 15 from Understanding the Linux Kernel... 2008-10-21 16:09 ah, never looked at it to tell the truth 2008-10-21 16:09 mostly it just reads out the source code to us without interpretation 2008-10-21 16:10 it's modern enough to know that submit_bh calls submit_bio 2008-10-21 16:24 hey 2008-10-21 16:25 sk8 oclock 2008-10-21 16:27 enjoy : 2008-10-21 16:28 utlk is actually pretty good on the page IO life cycle 2008-10-21 16:28 is missing the recent stuff on dirty page limits and all the changes related to that 2008-10-21 16:28 which were pretty major 2008-10-21 16:30 utlk? 2008-10-21 16:31 understanding the linux kernel 2008-10-21 16:48 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 16:51 uh :D 2008-10-21 16:51 I though it will be some sort of tool 2008-10-21 18:47 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 18:50 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-21 19:07 -!- macan(~chatzilla@159.226.41.129) has joined #tux3 2008-10-21 19:54 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-21 19:54 ACTION is getting ready... 2008-10-21 19:58 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-21 19:58 hi 2008-10-21 20:00 hi raluca 2008-10-21 20:00 how'd you like the photo gallery? 2008-10-21 20:00 http://phunq.net/sunset 2008-10-21 20:01 not in the ballpark of razvan's skillz, but... 2008-10-21 20:01 hmm, seems like "next session" was last session 2008-10-21 20:01 any maze? 2008-10-21 20:02 let's see if 2.6.27 is indexed yet 2008-10-21 20:02 yes 2008-10-21 20:02 ok it is 2008-10-21 20:03 hello 2008-10-21 20:03 hi hirofumi 2008-10-21 20:03 hi 2008-10-21 20:03 let's take a look at how fast path writepages works 2008-10-21 20:03 start with ext3_writepages 2008-10-21 20:03 remind me to ask a question about halloween afterwards... and flips, you're not really out are you ;-)? 2008-10-21 20:04 who me? 2008-10-21 20:04 flips: oh, I didn't see the gallery yet, let me check 2008-10-21 20:04 ->writepages is an address_space_operation 2008-10-21 20:05 meaning, associated with struct mapping 2008-10-21 20:05 err 2008-10-21 20:05 with struct address_space 2008-10-21 20:05 usually referenced by ->mapping 2008-10-21 20:05 are we at a specific place in the code? 2008-10-21 20:05 fun skew there 2008-10-21 20:05 we will be soon 2008-10-21 20:06 http://lxr.linux.no/linux+v2.6.26.6/fs/ext3/inode.c 2008-10-21 20:07 http://lxr.linux.no/linux+v2.6.26.6/fs/ext3/inode.c#L1769 2008-10-21 20:07 that's ext3_readpages 2008-10-21 20:07 ext3 doesn't support ->writepages 2008-10-21 20:08 I guess thats why I couldn't find it :P 2008-10-21 20:08 I should have looked in the struct from the start 2008-10-21 20:08 interesting question, why it's in ext2 and not ext3 2008-10-21 20:08 I'd guess it's the journaling 2008-10-21 20:09 makes it much harder to support 2008-10-21 20:09 because ext2 is old? 2008-10-21 20:09 because ext3 has more rules about writing I think 2008-10-21 20:09 you'll note writepage has 3 different implementations for ext3 2008-10-21 20:09 and the vfs writepages doesn't know those rules 2008-10-21 20:09 we'll get more specifc about that later 2008-10-21 20:09 oh yes 2008-10-21 20:10 so ext3_readpages is just a wrapper for the library function 2008-10-21 20:10 mpage_readpages 2008-10-21 20:10 where it supplies its *_get_block function 2008-10-21 20:10 as is ext3_readpage 2008-10-21 20:11 yes 2008-10-21 20:11 and not the way tux3 is structured at the moment 2008-10-21 20:11 and possibiliy we will avoid creating tux3_get_block and using that whole library interface 2008-10-21 20:11 ACTION arrives fashionably late 2008-10-21 20:11 I'm leaning in that direction 2008-10-21 20:12 very fashionable 2008-10-21 20:12 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L371 2008-10-21 20:12 it builds up a bio containing a bunch of pages instead of just one 2008-10-21 20:13 wait a moment, which direction are you leaning in? (and what directions are there?) 2008-10-21 20:13 which saves merging in the block elevator among onther things 2008-10-21 20:13 the direction I'm leaning in is not to have a tux3_get_block 2008-10-21 20:13 and therfore not using any library function that expects a get_block callback 2008-10-21 20:13 those library functions being ancient, crufty 2008-10-21 20:14 in practice I may find it's impractical, or it's totally practical 2008-10-21 20:14 don't know yet 2008-10-21 20:14 ah 2008-10-21 20:14 homework assignment? ;) 2008-10-21 20:14 so basically implementing readpage(s) manually? 2008-10-21 20:14 right 2008-10-21 20:15 like be beloved romfs :D 2008-10-21 20:15 be = my 2008-10-21 20:15 maybe 2008-10-21 20:15 probably work the effort 2008-10-21 20:15 the callback mess is really a mess, and the concept of the get_block interface is kind of broken 2008-10-21 20:15 it implies some place to cache the physical address 2008-10-21 20:16 whereass that really should be the business of the fs 2008-10-21 20:16 ext2 read/write page/pages just look like wrappers too 2008-10-21 20:16 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L371 <- we're here now 2008-10-21 20:16 yes, because it's a non-journalled file system, which actually has a read/write block interface 2008-10-21 20:17 one thing we don't know from looking here, is where the list of pages we're writing came from 2008-10-21 20:17 mpage_readpage(s) are pretty much the same thing 2008-10-21 20:18 we write pages when they are dirty write? 2008-10-21 20:18 and we want to use them for something else 2008-10-21 20:18 anyway, we're going to write the whole list, and if we're lucky, the list refers to pages contiguous on disk 2008-10-21 20:18 athough apparently there's no page cache lru interaction in the single page version 2008-10-21 20:18 because a single bio can only handle contiguous pages 2008-10-21 20:18 http://lxr.linux.no/linux+v2.6.26.6/fs/ext3/inode.c#L1423 eek 2008-10-21 20:19 there's no lru interaction in the multipage version either 2008-10-21 20:19 nope 2008-10-21 20:19 do_mpage_readpage can submit bios 2008-10-21 20:19 shapor, odd indeed 2008-10-21 20:20 so the data does not have to fit in a single bio 2008-10-21 20:20 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L384 <- nano optimization 2008-10-21 20:20 somebody must have measured it and determined it actually matters 2008-10-21 20:20 warming up a cache line ahead of time 2008-10-21 20:21 what do you mean theres no lru interaction? 2008-10-21 20:21 ok, so what's the add_to_page_cache_lru all about 2008-10-21 20:22 the bio also gets allocated inside do_mpage_readpage 2008-10-21 20:23 but the last submit happens outside of the main loop 2008-10-21 20:23 we peek into the page cache, if there's no page there we read it 2008-10-21 20:23 that's what that's doing 2008-10-21 20:23 why do we need that only in the multiple page case? 2008-10-21 20:23 the assumption: if we find a page, it must be either uptodate or dirty 2008-10-21 20:23 because the readpage won't be called on an uptodate page 2008-10-21 20:24 basically, no point in reading data we already have in the page cache 2008-10-21 20:24 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L168 <- here's the big rambling hack 2008-10-21 20:24 why does it matter if its dirty? 2008-10-21 20:24 either way if its in page cache, just use that copy, no? 2008-10-21 20:24 it's going to deal with issues like noncontiguous physical disk locations 2008-10-21 20:25 dirty or update, either way, don't have to read it 2008-10-21 20:25 dirty or uptodate I meant 2008-10-21 20:25 right 2008-10-21 20:25 right 2008-10-21 20:26 let's skim through this quickly and see if there's anything interesting 2008-10-21 20:26 which function are we skimming through? 2008-10-21 20:26 188 if (page_has_buffers(page)) 2008-10-21 20:26 189 goto confused; <- now why would that be 2008-10-21 20:27 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L168 2008-10-21 20:27 do_mpage_readpage 2008-10-21 20:27 big rambling hack 2008-10-21 20:27 fast 2008-10-21 20:27 unpretty 2008-10-21 20:27 akpm coded this whole file in a couple days as I recall 2008-10-21 20:28 yes 2008-10-21 20:28 shortly before being annointed mm czar ;) 2008-10-21 20:28 because we're not expecting there to be cached in mem data for the page we're reading 2008-10-21 20:29 because we're not expecting any filesystem that monkeys with buffers to touch this mapping? I don't know 2008-10-21 20:29 oh 2008-10-21 20:29 because we just added it 2008-10-21 20:29 and therefore it shouldn't have buffers 2008-10-21 20:30 could write BUG there, it would have to be a race 2008-10-21 20:30 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L350 2008-10-21 20:30 another hack 2008-10-21 20:30 it's a shame this function gets called one page at a time 2008-10-21 20:30 must move the cpu needle 2008-10-21 20:31 that's where I think we're going to do the whole bio prep look in tux3 2008-10-21 20:31 instead of interfacing to the library 2008-10-21 20:31 it's similar to the code already in filemap.c 2008-10-21 20:32 http://lxr.linux.no/linux+v2.6.26.6/fs/mpage.c#L350 <- let's consider that issue later 2008-10-21 20:33 re tux3 2008-10-21 20:33 199 * Map blocks using the result from the previous get_blocks call first. 2008-10-21 20:34 nigh on unreadable 2008-10-21 20:35 my brain is sizzling... 2008-10-21 20:35 223 * Then do more get_blocks calls until we are done with this page. <- makes more sense 2008-10-21 20:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-21 20:35 from 223 we see the ->get_block calls 2008-10-21 20:35 one for each buffer on the page 2008-10-21 20:36 sorry 2008-10-21 20:36 for each block on the page 2008-10-21 20:36 because we're doing this without buffers 2008-10-21 20:36 that's the main point of it 2008-10-21 20:36 hrm 2008-10-21 20:36 avoids buffer oriented IO for most file data 2008-10-21 20:37 we're using a fake buffer 2008-10-21 20:37 called map_bh 2008-10-21 20:37 just so the ->get_block interface will work 2008-10-21 20:37 crufty? yes very 2008-10-21 20:37 hm get_block gets called alot 2008-10-21 20:38 247 /* some filesystems will copy data into the page during 2008-10-21 20:38 248 * the get_block call, <- for example, tail packing filessystem, for example reiserfs 2008-10-21 20:39 290 * This page will go to BIO. Do we need to send this BIO off first? <- what happens if we hit discontiguous blocks 2008-10-21 20:39 the entire vfs interface/libraries really seem to be optimized/written for older 'simpler' filesystems 2008-10-21 20:39 it's because we "evolve" linux 2008-10-21 20:39 with incremental changes 2008-10-21 20:40 generally helpful for stability, but not for structure 2008-10-21 20:40 and then all the newer filesystems basically reimplement this or skip parts of it 2008-10-21 20:40 true 2008-10-21 20:40 the whole page, block, buffer seems like it should be much simpler 2008-10-21 20:40 major cut & paste culture 2008-10-21 20:40 mapping* 2008-10-21 20:40 nobody wants to read/understand this shit ;) 2008-10-21 20:40 hmm... isn't this the way MS does with their OS :P 2008-10-21 20:40 shapor, yes way simpler 2008-10-21 20:40 it's fairly obscene at the moment 2008-10-21 20:40 perhaps, but how many fs'es does MS support? 2008-10-21 20:40 I don't think akpm would argue that 2008-10-21 20:41 maze, a fraction of linux 2008-10-21 20:41 I was thinking about the OS not the FS part ;-) 2008-10-21 20:41 one thing bsders seem to tout is the fact they've done away with buffer heads 2008-10-21 20:41 I should check out how they went about that 2008-10-21 20:41 had some discussion with dilon about it 2008-10-21 20:42 bit didn't follow up by reading code 2008-10-21 20:42 I got the impression... by implementing a new, xfs like layer 2008-10-21 20:42 sounds like they just hid them then 2008-10-21 20:43 199 * Map blocks using the result from the previous get_blocks call first. <- ok I think I grok this now 2008-10-21 20:43 the filesystem is free to go ahead and map more blocks than the one asked for 2008-10-21 20:45 map in what sense? 2008-10-21 20:45 interesting project is to go trace the lifetime of map_bh through this code 2008-10-21 20:45 map as in call ->get_block 2008-10-21 20:45 to get a physical mapping, store it in the bh->block 2008-10-21 20:45 bh->b_blocknr I think it was 2008-10-21 20:46 and bh->b_size is how many blocks was mapped 2008-10-21 20:47 I didn't notice that 2008-10-21 20:47 good eyes 2008-10-21 20:47 very ugly hack 2008-10-21 20:48 really pushing the buffer interface past the breaking point 2008-10-21 20:48 yes, incrementale change 2008-10-21 20:48 we finished for _readpages for now? 2008-10-21 20:48 I sure hope so ;-) 2008-10-21 20:49 let's see if we can figure out why writepages is used by ext2 and not by ext3 in the next 11 minutes 2008-10-21 20:49 uhm, wild guess... journalling 2008-10-21 20:49 which would only matter for data-journalled - except 2008-10-21 20:50 for writes beyond eof and in holes 2008-10-21 20:50 of course, but that's not a sufficiently precise answer 2008-10-21 20:50 getting more precise 2008-10-21 20:50 you can't use the get_block interface? 2008-10-21 20:50 the proposal is "it only matters for data=journalled" 2008-10-21 20:50 for jbd, i think 2008-10-21 20:50 uhm, no. 2008-10-21 20:51 ok, to start with, writepages only works on data, not metadata 2008-10-21 20:51 the proposal was, it would only matter for data=journalled, except for write past eof, and sparse files, which is why it's always needed 2008-10-21 20:51 and hence the 3 different ext3 writepage implementations 2008-10-21 20:52 and no writepages implementation, which is the interesting question 2008-10-21 20:52 let's see under what conditions vfs calls ->writepages 2008-10-21 20:52 my guess is lack of writepages, means it falls back to using writepage one at a time 2008-10-21 20:52 perhaps writepages is too complicated to journalize 2008-10-21 20:53 actually data=ordered also needs special handling because of consistency guarantees it offers 2008-10-21 20:53 perhaps it can fail in too many ways :P 2008-10-21 20:53 or noone has bothered to yet ;-) 2008-10-21 20:53 (my guess) 2008-10-21 20:53 [since with journaling writes are slow anyways...] 2008-10-21 20:53 akpm would bother if it would make ext3 go faster 2008-10-21 20:54 so I'm rejecting that theory 2008-10-21 20:54 hmm, really? 2008-10-21 20:54 http://lxr.linux.no/linux+v2.6.26.6/mm/page-writeback.c#L1003 2008-10-21 20:54 you see the lengths that have gone to already 2008-10-21 20:54 we take our slight advantage over bsd seriously ;) 2008-10-21 20:54 right, so we use generic 2008-10-21 20:54 hmm? where's the advantage? 2008-10-21 20:55 so the answer may be: generic_writepages works for ext3, not for ext2 2008-10-21 20:55 maybe 2008-10-21 20:56 don't buy that 2008-10-21 20:56 anyway, we have found our way to the main place that pages are written in linux 2008-10-21 20:56 maybe the ext2 case could be more optimized? 2008-10-21 20:56 http://lxr.linux.no/linux+v2.6.26.6/fs/ext2/inode.c#L778 2008-10-21 20:56 http://lxr.linux.no/linux+v2.6.26.6/mm/page-writeback.c#L862 <- write_cache_pages 2008-10-21 20:56 let's see what is the generic one... 2008-10-21 20:56 _2copy will only get a few fringe cases, most ext3 traffic will go through here 2008-10-21 20:57 I suppose nobody got around to plugging generic_writepages into ext2 2008-10-21 20:57 block_dev.c: .writepages = generic_writepages, 2008-10-21 20:58 hmm, I don't like my latest theory either 2008-10-21 20:58 truth is, I don't know and with 2 minutes to go I'm declaring it homework 2008-10-21 20:58 that, and "read generic_writepages" 2008-10-21 20:59 :-) 2008-10-21 20:59 did we have fun today? 2008-10-21 20:59 we're certainly wading in it 2008-10-21 20:59 sinking... 2008-10-21 20:59 I feel it was shorter... 2008-10-21 20:59 one thing worth remembering: there's weird locking going on through all of this 2008-10-21 21:00 and scheduling 2008-10-21 21:00 in other words, we're taking a superficial view of it so far 2008-10-21 21:01 see, mpage_writepages is just a wrapper for write_cache_pages too 2008-10-21 21:02 for every one thing i learn in these sessions i find out about 10 more i have no clue about, makes it feel like a net loss ;) 2008-10-21 21:02 lol 2008-10-21 21:02 ACTION feels pretty much the same... 2008-10-21 21:02 there's also so much history behind how it all is... 2008-10-21 21:03 anyway what's up with halloween? 2008-10-21 21:04 and who's on these 2 photos? http://phunq.net/sunset/.1024/.html/woohoo.jpg.html and http://phunq.net/sunset/.1024/.html/wheee.jpg.html 2008-10-21 21:04 ok, write_cache_pages is for filesystems that don't supply a get_block, but do supply a ->writepage 2008-10-21 21:04 when is holloween? :D 2008-10-21 21:04 oct 31 2008-10-21 21:04 maze, we're making arrangements for something rather cool 2008-10-21 21:05 cool, but where and when? 2008-10-21 21:05 oct 31 2008-10-21 21:05 venice beach 2008-10-21 21:05 we can start early 2008-10-21 21:05 on 3rd street 2008-10-21 21:05 so Friday next week 2008-10-21 21:05 soon, yes 2008-10-21 21:05 expect email 2008-10-21 21:05 does venice beach, mean beach in venice, ca? 2008-10-21 21:06 yes, just south of santa monica 2008-10-21 21:06 caveat: you need to be on the southwest size of the 405 before late afternoon 2008-10-21 21:06 ugh, that's even farther than Malibu... 2008-10-21 21:06 before early afternoon even 2008-10-21 21:06 I'm about 10 minutes from malibu 2008-10-21 21:06 MaZe: planning on being in malibu? 2008-10-21 21:06 15 maybe 2008-10-21 21:07 no, just Malibu has somehow always seemed to be as a tropical paradise on the other end of the world (back when I lived in eu) 2008-10-21 21:07 :) 2008-10-21 21:07 oh, 26 minutes it tells me 2008-10-21 21:08 depends from where in malibu 2008-10-21 21:08 malibu is 27 miles itself 2008-10-21 21:08 I can't think of as "where movie stars get arresting for driving their suvs drunk" 2008-10-21 21:08 355 miles 2008-10-21 21:08 i think of it as the gateway to the santa monica mountains 2008-10-21 21:09 all the nice roads, vrewm 2008-10-21 21:10 start early? around when? 2008-10-21 21:11 ah, ext3 has its own kernel feature: ->write_begin, ->write_end for order write 2008-10-21 21:11 also used by btrfs I think 2008-10-21 21:12 btrfs doesn't have 2008-10-21 21:12 planned to be used 2008-10-21 21:13 prepare_write? 2008-10-21 21:13 not really 2008-10-21 21:13 different thing 2008-10-21 21:14 having a hard time finding the call point in vfs 2008-10-21 21:14 e.g. pagecache_write_begin? 2008-10-21 21:15 this one call: http://lxr.linux.no/linux+v2.6.26.6/drivers/block/loop.c#L769 2008-10-21 21:15 but surely that isn't the only one 2008-10-21 21:15 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L1912 2008-10-21 21:15 hirofumi, right 2008-10-21 21:16 from cscope: 2008-10-21 21:16 fs/affs/file.c affs_truncate 829 res = mapping->a_ops->write_begin(NULL, mapping, size, 0, 0, &page, &fsdata); 2008-10-21 21:16 fs/ext4/inode.c ext4_page_mkwrite 4851 ret = mapping->a_ops->write_begin(file, mapping, page_offset(page), 2008-10-21 21:16 mm/filemap.c pagecache_write_begin 2020 return aops->write_begin(file, mapping, pos, len, flags, 2008-10-21 21:16 mm/filemap.c generic_perform_write 2429 status = a_ops->write_begin(file, mapping, pos, bytes, flags, 2008-10-21 21:16 well, btrfs seems to do much difference way 2008-10-21 21:17 ok, it's a wrapper for grab_cache_page and prepare_write, or a fs hook 2008-10-21 21:17 really crufty 2008-10-21 21:18 luckly, prepare_write will go away soon 2008-10-21 21:18 and all one will use ->write_begin 2008-10-21 21:18 it will? never understood what it was for 2008-10-21 21:18 in the first place 2008-10-21 21:19 so I guess the answer is, it was always bogus 2008-10-21 21:20 three calls from buffer.c look like the only interesting ones 2008-10-21 21:21 hmm 2008-10-21 21:21 block_prepare_write()? not ->prepare_write 2008-10-21 21:21 those are frigne cases 2008-10-21 21:22 it's quite impressive how all this churn has happened and most filesystem code is barely affected 2008-10-21 21:23 recent bloatup in core is pretty scary 2008-10-21 21:23 ->prepare_write is replaced by ->write_begin 2008-10-21 21:24 ->commit_write was replaced by ->write_end 2008-10-21 21:31 flips, btw, do you already have ideas for buffer management? 2008-10-21 21:32 hirofumi, yes 2008-10-21 21:33 oh, great 2008-10-21 21:33 hirofumi, it's the main topic of the post I've been working on for the last week 2008-10-21 21:33 hopefully I'll post in about 2 hours 2008-10-21 21:34 great :) 2008-10-21 21:43 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 21:46 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-21 21:52 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 21:53 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-21 21:54 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-22 00:47 -!- pgquiles(~pgquiles@71.Red-79-154-137.staticIP.rima-tde.net) has joined #tux3 2008-10-22 01:02 ACTION reads the backlog 2008-10-22 01:05 ACTION goes to bed 2008-10-22 01:13 night 2008-10-22 03:59 -!- pgquiles_(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-22 04:59 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-22 06:55 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-22 08:21 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-22 10:07 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-22 10:45 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-22 10:47 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-22 15:02 -!- kbingham_(~kbingham@82-46-4-172.cable.ubr03.aztw.blueyonder.co.uk) has joined #tux3 2008-10-22 15:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-22 15:54 cscope it pretty nice! 2008-10-22 16:02 it is 2008-10-22 16:02 faster than lxr 2008-10-22 16:03 but no back button or open in new tab or post url to code 2008-10-22 16:03 I'd better wrap up this post and post it before my head explodes 2008-10-22 16:03 I'm looking forward to the sound of other people's heads exploding 2008-10-22 17:15 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-22 18:22 shapor, what do you suppose is more correct for ctime, should it be as of the write to buffer cache, or as of the actual transfer to disk? 2008-10-22 18:39 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-22 18:40 -!- ajonat(~ajonat@190.48.101.29) has joined #tux3 2008-10-22 20:09 buffer cache 2008-10-22 20:45 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-22 20:51 ...one filnal prooread... 2008-10-22 20:53 hey 2008-10-22 21:20 there, enough of that 2008-10-22 21:22 http://tux3.org/pipermail/tux3/2008-October/000300.html 2008-10-22 22:05 emacs (and modified cscope.el) and cscope is best tool to me for now 2008-10-22 22:10 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 00:27 flips: nice 2008-10-23 00:28 ACTION is reading now 2008-10-23 00:28 enjoy 2008-10-23 00:29 was doing some testing tonight against the scheduler. there's no way the current scheduler rebalancing code can guarantee determinancy since it still balance on a best-effort and can double/cross lock runqueues delaying the cpu local schedule() calls from being able to reschedule 2008-10-23 00:29 I'll have do some kind of rt based processor isolations that's possibly dynamic 2008-10-23 00:29 determinacy? 2008-10-23 00:29 flips: you and matt are a great combination, like two peas in a po 2008-10-23 00:30 flips: deterministic latency 2008-10-23 00:30 seems like 2008-10-23 00:30 the -rt patch is fully preemptible but the schedule is mismatch because it's largely best effort 2008-10-23 00:31 s/realtime/rubbertime/ 2008-10-23 00:34 I wonder if the -rt patch still fails to do swsuspend properly 2008-10-23 00:35 don't know 2008-10-23 00:37 po=pod 2008-10-23 00:54 flips: btw, you'll have ot abstract the reagular file handling code with the metadata file stuff if you didn't know that already using routines so that the metadata files aren't... 2008-10-23 00:54 treat the same as regular files. They'll still use basic file load routines and stuff, but not have the same semantics in the fs 2008-10-23 00:54 you probably know that already 2008-10-23 00:56 you'll need some kind atomic write barrier, I guess, as well 2008-10-23 00:59 ACTION remembers soft-updates in ffs 2008-10-23 00:59 er FreeBSD UFs 2008-10-23 01:03 flips: the email is too design heavy for most regular lkml folks, but keep on going... 2008-10-23 01:05 ACTION never liked that aspect of Linux kernel culture 2008-10-23 01:07 flips: isn't this going to require VM changes as well for the forked-buffer stuff ? 2008-10-23 01:07 or does that stuff already exist from the ext3 work ? 2008-10-23 01:09 Well, it's the best of the worse ways :\ 2008-10-23 01:15 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-10-23 01:18 bh, I didn't post it to lkml 2008-10-23 01:19 bh, all the necessary interfaces are available to modules 2008-10-23 01:19 and if they weren't, I 'd make them 2008-10-23 01:19 ok, what about the buffer forking ? 2008-10-23 01:19 ok 2008-10-23 01:19 makes snse 2008-10-23 01:19 sense 2008-10-23 01:19 should work out fine 2008-10-23 01:19 because it's kind of a dramatic thing for the VM 2008-10-23 01:19 we'll make it a tux3 U homework project 2008-10-23 01:20 ext3 does a similar thing 2008-10-23 01:20 yeah, figured as much 2008-10-23 01:20 which is one of the reasons the interfaces have to be exposed 2008-10-23 01:20 but it's probably not as sophisticated 2008-10-23 01:21 it's a journal, it has its own sophistication 2008-10-23 01:21 read the linked pdf for some great entertainment 2008-10-23 01:22 showing design stuff on lkml isn't entirely pointless, at least jon corbet reads it 2008-10-23 01:22 and understands 2008-10-23 01:23 that's good 2008-10-23 01:23 he kind of cares 2008-10-23 01:24 man this is going to trigger edge cases like crazy 2008-10-23 01:26 ACTION wishes he can work on this :\ 2008-10-23 01:26 easy enough to get that wish granted 2008-10-23 01:28 nice speculative recovery 2008-10-23 01:28 posting to lkml won't hurt 2008-10-23 01:28 it's interesting reading 2008-10-23 01:29 flips: you store the metadata/header for extents in reverse order right ? 2008-10-23 01:30 yes 2008-10-23 01:30 nice 2008-10-23 01:31 because I'm thiking about that blob thing 2008-10-23 01:31 you can versio that metadata with constant in a fixed location 2008-10-23 01:32 versio? 2008-10-23 01:32 version I guess 2008-10-23 01:32 extent version using a special number 2008-10-23 01:32 yes 2008-10-23 01:33 so that you kow how to read that structure 2008-10-23 01:33 versioning strategy's pretty well worked out 2008-10-23 01:33 typing with one hand right now :) 2008-10-23 01:52 ok night 2008-10-23 01:52 hello 2008-10-23 03:48 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-23 04:30 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-10-23 07:07 -!- pgquiles_(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-23 07:12 -!- mlankhorst(~m@fw1.astro.rug.nl) has joined #tux3 2008-10-23 09:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 09:10 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 10:08 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-23 10:13 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-23 10:42 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-23 10:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 11:41 quick q: page->lru is something that filesystem will not mess with, right? 2008-10-23 12:02 -!- pgquiles(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-23 12:06 IIRC, page->lru is vm stuff. it will be used to manage which is page active. and if vm want more free memory, it may ask to clean inactive pages to fs 2008-10-23 12:07 great! 2008-10-23 12:07 I'm the VM in my case ;-) 2008-10-23 12:35 -!- pgquiles(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-23 12:56 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-23 13:23 -!- FelipeS_(~Felipe@lawn-128-61-31-5.lawn.gatech.edu) has joined #tux3 2008-10-23 14:46 razvanm, correct 2008-10-23 14:46 thought the filesystem my move pages to the front or back of the lru queue if it thinks it knows something the vmm doesn't 2008-10-23 15:02 my tux3.notes file is about 2,000 lines long 2008-10-23 15:02 mostly consisting of posts I haven't posted 2008-10-23 15:18 wow... big backlog :P 2008-10-23 15:59 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-23 16:43 folks 2008-10-23 17:16 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 17:30 ponk 2008-10-23 17:33 sk8 oclock 2008-10-23 19:11 hello 2008-10-23 19:11 which stage will do "physical remapping"? rollup? 2008-10-23 19:13 hirofumi, not rollup 2008-10-23 19:13 phase transition 2008-10-23 19:13 oh 2008-10-23 19:13 when a new phase is ready to commit to disk, first think to do is flush all dirty inodes 2008-10-23 19:14 all dirty inodes means ileaf? or whole itable btree? 2008-10-23 19:15 flushing dirty inodes in kernel would call write_pages, in tux3 userspace calls write_buffer->map->ops->brwrite 2008-10-23 19:15 dirty inode table blocks have to be flushed too 2008-10-23 19:15 well 2008-10-23 19:15 not flushed, but committed 2008-10-23 19:16 flushing is just the process of committing cached data to writeout 2008-10-23 19:16 yes 2008-10-23 19:16 the whole btree does not have to be committed 2008-10-23 19:17 because we have the "promise" system 2008-10-23 19:17 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-23 19:17 ACTION have to read that email more deeply 2008-10-23 19:17 we just write out the leaf nodes and "promise" to update the pointers in parents 2008-10-23 19:18 maybe promise is logical logging? 2008-10-23 19:18 yes 2008-10-23 19:18 i see 2008-10-23 19:19 I used to call it logical records in commit blocks 2008-10-23 19:19 promise is short for that 2008-10-23 19:19 i see 2008-10-23 19:20 another question is: 2008-10-23 19:20 modified buffers in active tree can't free, because we don't know 2008-10-23 19:20 final state of that buffer until rollup? If so, we must pin many 2008-10-23 19:20 btree-index buffers (etc.) of active tree if user modified may inodes? 2008-10-23 19:20 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 19:20 exactly 2008-10-23 19:21 i see 2008-10-23 19:21 that is the "dirty metadata" 2008-10-23 19:21 which must be reconstructed if we crash 2008-10-23 19:21 using the logical records in the commit blocks (promises) 2008-10-23 19:21 yes 2008-10-23 19:22 even if vm wants more memory, we can't free those buffers? 2008-10-23 19:27 how will we handle ENOSPC? we must do rollup/pahse transition to make more free space? 2008-10-23 19:28 if it is possible 2008-10-23 19:39 hirofumi, yes, those buffers pin memory even of the vm is low on memory, so we need to make sure not to use too much 2008-10-23 19:40 i see 2008-10-23 19:40 the closer we get to filesystem full, the shorter a phase can be 2008-10-23 19:41 when the vmm is very low on memory, it sets the PF_MEMALLOC flag and calls ->writepage to free memory 2008-10-23 19:41 the PF_MEMALLOC flag gives the filesystem access to an emergency reserve of a few megabytes 2008-10-23 19:42 yes 2008-10-23 19:43 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-23 19:43 otherwise, if cache memory is low but the vmm has not called our filesystem to write out dirty pages, then we will just block in alloc_pages waiting for memory to be freed 2008-10-23 19:45 i see. I thought if we can handle with more few memory, it's great 2008-10-23 19:46 it won't use much memory, it's just index blocks that are pinned 2008-10-23 19:46 each index block references up to 512 data blocks 2008-10-23 19:47 index blocks and bitmap blocks, each bitmap block covers 128 MB of filesystem blocks 2008-10-23 19:47 i see 2008-10-23 19:48 if we need to unpin some then we do a rollup 2008-10-23 19:48 being careful to always have enough memory in the emergency reserve to do the rollup 2008-10-23 19:49 i see 2008-10-23 19:50 i read someone says modern fs uses too much memory 2008-10-23 19:50 btrfs/hammer etc. 2008-10-23 19:51 zfs 2008-10-23 19:51 yes 2008-10-23 19:51 they aren't careful with memory 2008-10-23 19:51 I have been very careful 2008-10-23 19:51 i really dislike that 2008-10-23 19:52 right, it's no good to have more memory if it is just wasted 2008-10-23 19:52 how do they waste it? 2008-10-23 19:52 ZFS has 128 byte block pointers for example 2008-10-23 19:54 128 bytes?!? 2008-10-23 19:54 1024 bits? 2008-10-23 19:54 huge space 2008-10-23 19:54 why do they need them so big? 2008-10-23 19:55 good question 2008-10-23 19:55 128bits? not bytes 2008-10-23 19:55 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-23 19:55 128 bits sounds better :D 2008-10-23 19:55 one thing is, when they do their raid, they like to put multiple pointers to redundant copies of the block in the block pointer 2008-10-23 19:55 hi 2008-10-23 19:55 128 bytes, yes 2008-10-23 19:56 hi raluca 2008-10-23 19:56 ACTION has bad manners :P 2008-10-23 19:57 ah 2008-10-23 19:57 in that huge size they have plenty of space to even store a sha1 or an md5 2008-10-23 19:58 yes, they do that 2008-10-23 20:00 it's tux3 oclock 2008-10-23 20:00 yup 2008-10-23 20:00 I was looking for the zfs repository 2008-10-23 20:00 ACTION is ready 2008-10-23 20:00 I've been in there before 2008-10-23 20:00 google didn't find it right away 2008-10-23 20:01 left as an exercise for the interested reader how they wasted, err, needed 128 bytes for a block pointer 2008-10-23 20:01 i'm useing git for opensolaris 2008-10-23 20:01 right, there's an online repo somewhere on opensolaris.org 2008-10-23 20:02 google just didn't find it right away 2008-10-23 20:02 git://repo.or.cz/opensolaris.git 2008-10-23 20:02 ok, let's be a little selfish today and take a look at something we actually need for tux3 2008-10-23 20:02 mirror though 2008-10-23 20:02 the latest post mentions "forking" a buffer 2008-10-23 20:03 that happens when we want to change a buffer, but it is already committed to writeout 2008-10-23 20:03 hey 2008-10-23 20:03 or another way of putting it, it not in the current phase 2008-10-23 20:03 so we can't change it any more 2008-10-23 20:04 what we do is remove the underlying page from the buffer cache, or in kernel, the page cache 2008-10-23 20:04 copy the data to another page, and put that in the page cache 2008-10-23 20:04 let's look at kernel code to see how we might make that work 2008-10-23 20:04 where should we look first? 2008-10-23 20:04 buffer.c? :D 2008-10-23 20:05 what are we looking for? 2008-10-23 20:05 a write 2008-10-23 20:05 we're looking for where the block is cached 2008-10-23 20:05 remember, in kernel, buffers are just handles for block IO 2008-10-23 20:06 as opposed to in tux3 userspace where we tend to think of them as cached blocks 2008-10-23 20:06 well, the still are, but in kernel they are not the primary unit 2008-10-23 20:06 pages are 2008-10-23 20:07 there are two kinds of places where filesystems cache block 2008-10-23 20:07 the "buffer cache", which is just a page cache mapped one to one to the block device 2008-10-23 20:07 and the so called "page cache" which is a page cache per inode 2008-10-23 20:08 "page cache" is actually misnamed, it sounds like one big caches, it's actually lots of caches 2008-10-23 20:08 ok, so where we should look depends on the kind of block we need to fork 2008-10-23 20:09 suppose it is a directory entry block, where do we look? 2008-10-23 20:09 dirent is cached as page cache, or buffer cache? 2008-10-23 20:09 tell me 2008-10-23 20:10 and your logic 2008-10-23 20:10 dirent is allocted using the slab allocator 2008-10-23 20:10 that was a wild guess 2008-10-23 20:10 and the content of the directory is just a file 2008-10-23 20:10 right 2008-10-23 20:10 so we should look in... 2008-10-23 20:11 ok, let's look at ext3_bread 2008-10-23 20:11 see where it goes 2008-10-23 20:12 should we use .26 or .27 today? 2008-10-23 20:12 let's try .27 2008-10-23 20:12 http://lxr.linux.no/linux+v2.6.27/fs/ext3/inode.c#L1054 2008-10-23 20:12 keep up with mainline, more or less 2008-10-23 20:12 yes 2008-10-23 20:12 RazvanM, ok, follow it in, see where it goes 2008-10-23 20:14 found the next function in? 2008-10-23 20:14 (sorry, I was looking for the dentry_cache :P) 2008-10-23 20:15 next is ext3_getblk 2008-10-23 20:15 right 2008-10-23 20:15 and where does it go from there? 2008-10-23 20:15 the main call 2008-10-23 20:15 hint: don't worry about the ext3 handle 2008-10-23 20:15 ll_rw_block 2008-10-23 20:16 for now 2008-10-23 20:16 which looks to be deprecated :| 2008-10-23 20:16 look closer 2008-10-23 20:16 sb_getblk after get_block like op 2008-10-23 20:16 right 2008-10-23 20:16 let's see how that works 2008-10-23 20:16 sb_getblk 2008-10-23 20:16 we've looked at it before 2008-10-23 20:16 it sounds familiar :P 2008-10-23 20:17 http://lxr.linux.no/linux+v2.6.27/fs/buffer.c#L1403 2008-10-23 20:17 right 2008-10-23 20:18 follow the _slow path 2008-10-23 20:18 I know those 2008-10-23 20:18 this is where the kernel code is really crappy ;) 2008-10-23 20:18 the block to page mapping is done in grow_buffers :p 2008-10-23 20:18 http://lxr.linux.no/linux+v2.6.27/fs/buffer.c#L1119 2008-10-23 20:19 grow_buffers :D 2008-10-23 20:19 love the name 2008-10-23 20:19 I woudn't expect anybody to guess taht 2008-10-23 20:19 took me 10 minutes to figure it out last time we were in here 2008-10-23 20:19 1109 /* Create a page with the proper size buffers.. */ 2008-10-23 20:20 http://lxr.linux.no/linux+v2.6.27/fs/buffer.c#L1028 2008-10-23 20:20 1048 if (!try_to_free_buffers(page)) 2008-10-23 20:20 1049 goto failed; 2008-10-23 20:20 <- lovely 2008-10-23 20:21 and at failed we BUG 2008-10-23 20:21 this mechanism is in a state of transition ;) 2008-10-23 20:21 1055 bh = alloc_page_buffers(page, size, 0); 2008-10-23 20:21 flips: can you explain again the link between the page and buffer_head? :D 2008-10-23 20:21 each page has a page to attach a circular list of buffer_heads to it 2008-10-23 20:22 as many as there can be blocks on the page 2008-10-23 20:22 usually one 2008-10-23 20:22 page is 4k usually 2008-10-23 20:22 yes 2008-10-23 20:22 is the block usually 4k? 2008-10-23 20:22 yes 2008-10-23 20:22 cool 2008-10-23 20:22 default for nearly all filesystems 2008-10-23 20:22 didn't know that :D 2008-10-23 20:23 it's not very cool actually, because 4K is a bit small on modern hardware 2008-10-23 20:23 non-unix fs has 512bytes 2008-10-23 20:23 this is a big flaw in linux 2008-10-23 20:23 can't ahve buffer bigger than page 2008-10-23 20:23 romfs has 1KB I think :P 2008-10-23 20:24 smaller blocks create less external fragmentation 2008-10-23 20:24 one more q: what happen when there is more than one bh in a page? 2008-10-23 20:24 that will only be the case when block size is a fraction of page size 2008-10-23 20:24 right 2008-10-23 20:24 see all that code that checks for buffers being there and puts them there if they are not 2008-10-23 20:25 sometime when you have a lot of time on your hands, go read try_to_free_buffers 2008-10-23 20:25 ...the worst fundtion in the entire kernel 2008-10-23 20:25 so the bh in page are continuos? 2008-10-23 20:25 continous? 2008-10-23 20:25 continuous? 2008-10-23 20:26 contiguous 2008-10-23 20:26 yes 2008-10-23 20:26 what space on the disk do they cover 2008-10-23 20:26 and that's important 2008-10-23 20:26 because what it does in the case of block smaller than page is create false sharing 2008-10-23 20:26 good... I didn't know that :D 2008-10-23 20:27 not contiguous on disk 2008-10-23 20:27 contiguous in memory 2008-10-23 20:27 aaaaaa 2008-10-23 20:27 sorry 2008-10-23 20:27 grrr... 2008-10-23 20:27 I liked the other answer better :D 2008-10-23 20:27 yes, that leads to a lot of headaches 2008-10-23 20:27 exactly!! 2008-10-23 20:28 so in tux3, we want to branch a buffer, but we actually have to mess with a whole page 2008-10-23 20:28 but in tux3 the buffer will be 4k, right? 2008-10-23 20:28 not necessarily 2008-10-23 20:28 tux3 can handle 256 byte blocks 2008-10-23 20:29 I think we decided to make the smallest 512 2008-10-23 20:29 linux sector size 2008-10-23 20:29 yes :) 2008-10-23 20:29 let's keep going in 2008-10-23 20:29 ok 2008-10-23 20:29 find_or_create_page 2008-10-23 20:30 http://lxr.linux.no/linux+v2.6.27/mm/filemap.c#L720 2008-10-23 20:30 add_to_page_cache_lru 2008-10-23 20:30 add_to_page_cache 2008-10-23 20:31 add_to_page_cache_locked 2008-10-23 20:31 this looks also familiar... 2008-10-23 20:31 radix_tree_insert 2008-10-23 20:32 http://lxr.linux.no/linux+v2.6.27/lib/radix-tree.c#L291 2008-10-23 20:32 there we see the nice new rcu code that got added by peterz in the last cycle 2008-10-23 20:32 lockless pagecache? 2008-10-23 20:33 wait, it was there before 2008-10-23 20:34 ok, let's poke around in _insert for a while 2008-10-23 20:34 it's good to have an idea what happens there 2008-10-23 20:35 it's a radix tree with branching factor 64 2008-10-23 20:35 meaning we have a lot of page cache pointers sitting next to each other 2008-10-23 20:35 it's tempting to use that fact when we are operating on pages that are contiguous in the page cache 2008-10-23 20:35 to avoid lookups 2008-10-23 20:36 I don't know of any kernel code that has actually done that though 2008-10-23 20:36 also haven't looked hard 2008-10-23 20:37 note: lockless page cache is due to nick piggin 2008-10-23 20:37 and it isn't completely merged yet 2008-10-23 20:37 I presume that a goal is to get rid of even the rcu locks from the radix tree 2008-10-23 20:38 oh, i thought it was done 2008-10-23 20:38 part went in 2008-10-23 20:38 _insert is actually pretty simple 2008-10-23 20:39 add levels if we're trying to insert at a high address 2008-10-23 20:39 otherwise drill down through levels masking off the index 2008-10-23 20:39 empty parts of the tree have null pointers, fill them in if in our path 2008-10-23 20:40 that's about it. RCU strangeness to think about 2008-10-23 20:40 otherwise we're done here 2008-10-23 20:40 ok, so what is a tux3 buffer fork going to look like, based on what we just looked at? 2008-10-23 20:41 ok, lookup cache, then copy data to new cache, and insert new pos on radix tree 2008-10-23 20:42 ? 2008-10-23 20:42 basically, and we'll need to worry about locking 2008-10-23 20:42 copy dat to new cache will happen in multiple steps, right? 2008-10-23 20:42 and we need to worry about false sharing 2008-10-23 20:42 there are other blocks onthe same page, what happens to them? 2008-10-23 20:43 it's a per-block operation as currently conceived 2008-10-23 20:44 don't copy other blocks, becase new cache may not be contiguous 2008-10-23 20:44 hirofumi, when we branch a block we don't change its position 2008-10-23 20:44 um.. 2008-10-23 20:44 we just pull the page that carries the block data out of the page cache, leaving a copy in its place 2008-10-23 20:45 we don't necessarily even need buffer heads on the page we pull out of cache 2008-10-23 20:45 because nobody is going to be changing it, hence no need for per-block locking 2008-10-23 20:45 and we can do the actual transfer to disk with a bio 2008-10-23 20:46 so we will just be pulling the underlying page out and replacing it with a new page 2008-10-23 20:46 that has the effect of forking all the buffers on the same page 2008-10-23 20:46 so we must have some bits in the buffer_head flags to tell us which phase a buffer belongs to 2008-10-23 20:47 um.. one may be ileaf, and one may be dleaf etc.? 2008-10-23 20:47 that is, whether it has already been forked or not 2008-10-23 20:47 here in a file page cache we will only find file data or directory data or bitmap block 2008-10-23 20:47 later atom stuff 2008-10-23 20:47 ileaf and dleaf live in the buffer cache 2008-10-23 20:48 which is direct-mapped to the block device 2008-10-23 20:48 it is handled in much the same way 2008-10-23 20:48 ext3 does not directly perform operations on the buffer cache, I think I recall 2008-10-23 20:49 but lets the vfs do it 2008-10-23 20:49 using the generic_ functions 2008-10-23 20:49 and ext3 just supplies a ->get_block function 2008-10-23 20:49 well, let's go see how ext3_get_block works 2008-10-23 20:50 at some point it obviously has to go read some metadata 2008-10-23 20:50 most likely with sb_bread 2008-10-23 20:50 yes 2008-10-23 20:51 http://lxr.linux.no/linux+v2.6.27/fs/ext3/inode.c#L953 2008-10-23 20:51 then http://lxr.linux.no/linux+v2.6.27/fs/ext3/inode.c#L786 2008-10-23 20:51 these functions are a little oddly structured 2008-10-23 20:51 because they are lockless 2008-10-23 20:52 ext3_block_to_path just does arithmetic 2008-10-23 20:52 because caller has lock of requested page? 2008-10-23 20:53 no, it's completely lockless 2008-10-23 20:53 it uses the block pointers like locks 2008-10-23 20:53 um.. what happen if it was truncated? 2008-10-23 20:53 it checks and backs out the operation 2008-10-23 20:53 see verify_chain 2008-10-23 20:54 ok, 367 bh = sb_bread(sb, le32_to_cpu(p->key)); 2008-10-23 20:54 in ext3_get_branch 2008-10-23 20:54 so we got to a function that is familiar 2008-10-23 20:55 yes 2008-10-23 20:55 oh thing to watch out for: some of the blocks on the buffer cache page may be data, that is, in an inode page cache, and some may be metadata, in the bufffer cache 2008-10-23 20:55 I haven't thought about what impact that might have on the forking operation 2008-10-23 20:55 it's probably ok, but needs to be thought about 2008-10-23 20:56 ok, that's enough for today 2008-10-23 20:56 how'd we do on the interesting front? 2008-10-23 20:57 do forking? 2008-10-23 20:57 I meant, was it interesting? 2008-10-23 20:58 yes 2008-10-23 20:58 this lesson was above my knowledge level :P I have to dig more to qualify for it :P 2008-10-23 20:58 razvanm, it's not above your level 2008-10-23 20:59 just read through the log once more and it will all look simple 2008-10-23 20:59 I will :D 2008-10-23 21:00 next time I think we might take a look at how we might go about doing filesystem IO without having a tux3_get_block function 2008-10-23 21:00 I don't know myself whether it's practical to avoid this 2008-10-23 21:00 it's not particularly hard to implement 2008-10-23 21:00 but I think I want to see if we can just avoid the whole block IO library and work directly with the page cache and bio 2008-10-23 21:00 IIRC, btrfs uses get_extents or something 2008-10-23 21:00 sb_bread is our friend 2008-10-23 21:01 new invention 2008-10-23 21:01 I don't want to go that far just yet 2008-10-23 21:01 the api is likely to take some time to settle 2008-10-23 21:01 well 2008-10-23 21:01 I think it is the get_ model that is broken 2008-10-23 21:01 not the block vs extents 2008-10-23 21:02 the get_ model assumes there is a place to cache the physical pointer 2008-10-23 21:02 which historically has been the buffer_head 2008-10-23 21:02 but it turns out that the cached physical pointer is rarely used 2008-10-23 21:02 i see 2008-10-23 21:02 not enough to justify the strange mechanisms that are in place to handle it 2008-10-23 21:03 ah, becase we use forking on write. physical pointer is not used much? 2008-10-23 21:03 because 2008-10-23 21:04 I am thinking that it caching physical pointers is a win, then there should be a library function to do that that all filesystems can use, so that the whole IO path does not revolve around the need for a fs to generate a physical pointer 2008-10-23 21:04 forking happens on the "top end" of filesystem and is not part of writeout 2008-10-23 21:04 it is to avoid stalls in buffer IO calls 2008-10-23 21:05 "remapping" is where we change physical pointers around 2008-10-23 21:05 which occurs in filemap.c, during an inode flush 2008-10-23 21:05 and will also need to happen when we write out inode table blocks during phase commit 2008-10-23 21:06 and index blocks during rollup 2008-10-23 21:07 um.. natural delayed write.. 2008-10-23 21:08 delayed write -> delayed allocation 2008-10-23 21:09 yes 2008-10-23 21:09 I'm thinking about adding delayed inode number assignment too 2008-10-23 21:09 oh, i see 2008-10-23 21:10 could possibly confuse nfs 2008-10-23 21:13 thanks. this talk seems to clear my brain more or less 2008-10-23 21:15 you're welcome 2008-10-23 21:30 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-23 21:30 hmm, a little late 2008-10-23 21:30 ;-) 2008-10-23 21:31 good think we have a log 2008-10-23 21:31 good thing 2008-10-23 21:35 yup catching up now 2008-10-23 21:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-23 21:52 hi tim_dimm 2008-10-23 21:53 flips... 2008-10-23 21:53 word 2008-10-23 22:02 dword 2008-10-23 22:03 but not msword 2008-10-23 22:03 ddword 2008-10-23 22:11 qword 2008-10-23 22:11 f word 2008-10-23 22:16 wor dup 2008-10-23 22:16 sword 2008-10-23 22:16 *swish* 2008-10-23 22:16 ACTION cuts a swatch through the witty banter 2008-10-23 22:16 swishy 2008-10-23 22:17 wordy 2008-10-24 01:51 folks 2008-10-24 01:51 I see people are talking about the lockless page cache 2008-10-24 02:17 a bit of... 2008-10-24 05:54 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-24 08:30 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-24 08:52 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-24 09:05 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-24 09:15 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-24 09:21 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-24 10:06 -!- pgquiles(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-24 10:15 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-24 10:36 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-24 10:45 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-24 12:31 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-24 15:11 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-24 15:44 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-24 16:12 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-24 16:13 ACTION got to the point where he needs to understand how get_block works 2008-10-24 16:17 ACTION wishes underscores didn't make him cringe 2008-10-24 16:18 ravanm, the vfs calls the filesystem saying "for this buffer, what is the physical address?" and the filesystem fills in the b_blocknr in the buffer 2008-10-24 16:19 the function is misnamed 2008-10-24 16:19 it doesn't get the block, but the address of the block 2008-10-24 16:20 or preciscely, "for this inode at this logical block offset, please fill in the buffer->b_blocknr" 2008-10-24 16:20 it's actually dumb to use a buffer to pass the result back, it should just be the function result 2008-10-24 16:25 thanks for the answer! :D 2008-10-24 16:25 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-24 16:25 to fill the context, what I'm trying to implement is block_read_full_page 2008-10-24 16:28 well, I think it is time to do a big directory move, move everything out of user/test into / in the repo 2008-10-24 16:28 (thinking out loud) the block_read_full_page assumes that the page is filled by some bhs 2008-10-24 16:28 any objection? 2008-10-24 16:28 block_read_full_page will add buffers if there are none 2008-10-24 16:29 from where it will take the info to do that? 2008-10-24 16:29 from the index of the page? 2008-10-24 16:30 it only needs to know the block size 2008-10-24 16:30 which get gets from page->mapping->sb 2008-10-24 16:30 how about the start point? 2008-10-24 16:30 ok, let me see, we have doc/ in the top level of the repo 2008-10-24 16:30 (in romfs it was from index: http://lxr.linux.no/linux+v2.6.26/fs/romfs/inode.c#L432 ) 2008-10-24 16:31 so maybe I should just move user/test/* to user/* 2008-10-24 16:31 and then there will be a kernel and a fuse one? 2008-10-24 16:31 aaa... fuse is inside the user now 2008-10-24 16:32 use, it would be doc/ user/ kernel/ 2008-10-24 16:33 doc/ user/ kernel/ README COPYING INSTALL Makefile 2008-10-24 16:33 something like that 2008-10-24 16:34 and the Makefile from the root will build both user and kernel? 2008-10-24 16:34 and user/ will eventually have fsck.tux3 2008-10-24 16:34 I guess we won't try to build kernel 2008-10-24 16:34 probably just call the makefile in user 2008-10-24 16:35 it could possibly extract a patch from a mercurial repo it knows about 2008-10-24 16:35 err 2008-10-24 16:35 git repo for kernel stuff 2008-10-24 16:35 it could also do make tarball and make docs 2008-10-24 16:35 or make deb even 2008-10-24 16:36 ok, anyway /test/ is going to go away right now 2008-10-24 16:37 sucks hg and git don't really understand the concept of mv 2008-10-24 16:37 hg does! 2008-10-24 16:37 I mean I moved some stuff around using it 2008-10-24 16:37 renamed a directory 2008-10-24 16:38 "rename files; equivalent of copy + remove" 2008-10-24 16:38 it fakes it 2008-10-24 16:38 yup 2008-10-24 16:38 but the history is not lost 2008-10-24 16:41 it is kind of 2008-10-24 16:41 it doesn't have the notion that the moved object is the same 2008-10-24 16:41 in fact hg and git really don't have notions of objects, only of equality 2008-10-24 16:41 equality of file text 2008-10-24 16:41 deficiency 2008-10-24 16:41 why is not enought? :D 2008-10-24 16:42 but it works sort of ok most of the time 2008-10-24 16:42 well for example if a person changes the name, you know they're the same person, even if they change hats too 2008-10-24 16:43 so if in a changeset you both mv and edit the mved file, git and hg lose track 2008-10-24 16:44 true 2008-10-24 16:44 Depends on how seriously it is modified 2008-10-24 16:44 I changed a file for less than 10% and it was still tracked 2008-10-24 16:45 this was in hg? 2008-10-24 16:45 or git? 2008-10-24 16:47 git 2008-10-24 16:47 ok there we go 2008-10-24 16:48 On less than 10% it says 'moved' more was a rewrite iirc 2008-10-24 16:48 one can imagine the messy heuristics to do that 2008-10-24 16:48 accuracy is always better 2008-10-24 16:49 Wine can still trace some code that was rewritten a lot of times and moved various things around 2008-10-24 16:51 git-blame even found some wine 0.0.2 code 2008-10-24 16:53 Q: what does bmap? 2008-10-24 16:53 the original Monotone relied even more heavily on heuristcs for rename 2008-10-24 16:53 (the readpage looks pretty mess in minix :|) 2008-10-24 16:53 there are always places where it causes strange behavior 2008-10-24 16:54 bmap queries the filesystem about the physical location of a given logical block of a file 2008-10-24 16:54 bmap is problematic for filesystems that do deferred allocation 2008-10-24 16:54 static sector_t minix_bmap(struct address_space *mapping, sector_t block) 2008-10-24 16:54 or online defrag, or shrink, or remap 2008-10-24 16:55 so I just need to properly construct the mapping and then ask for the bloks one by one, right? 2008-10-24 16:55 yes 2008-10-24 16:55 bmap doesn't always work, for the reasons above 2008-10-24 16:55 an inherently racy interface 2008-10-24 16:56 so I should make the readpage work instead? :D 2008-10-24 16:56 readpage isn't exposed to userspace 2008-10-24 16:56 that's the point of bmap 2008-10-24 16:56 whay of userspace, principally lilo, to find physical block locations 2008-10-24 16:57 using bmap inside kernel is unconscionable, but probably there are cases 2008-10-24 16:58 I'm going back to readpage :P 2008-10-24 16:58 right 2008-10-24 16:59 note that the readpage interface is also racy, relying on the cached physical block address as it does 2008-10-24 16:59 there is nothing to prevent the physical block address from moving before being read 2008-10-24 17:00 filesystem has to provide that locking 2008-10-24 17:02 so I guess I could easily end up in a bad state by calling stuff in an unexpected order... 2008-10-24 17:03 if your filesystem supports online defrag, shrink, remapping or delayed allocation 2008-10-24 17:51 sk8 oclock 2008-10-24 17:51 going to be a sunset skate 2008-10-24 17:54 enjoy :D 2008-10-24 18:16 ACTION is trying to find who should properly set the i_blkbits in an inode 2008-10-24 18:17 and the answer is: alloc_inode 2008-10-24 19:02 wow... alloc_page_buffers doesn't make a circular list of bh 2008-10-24 19:03 it just make a NULL-terminat list 2008-10-24 19:03 and create_empty_buffers finished the job 2008-10-24 19:18 (again thinking out loud) actually... I don't think I need to create the buffer links to cover the page 2008-10-24 19:18 I can use one 2008-10-24 19:19 get the real address using the get_block and then call the bread to get the data 2008-10-24 19:20 hmm... my __bread also allocated a bh 2008-10-24 19:24 http://lxr.linux.no/linux+v2.6.26/fs/minix/itree_common.c#L145 more complicated things are taking places... 2008-10-24 19:24 because of the indirect blocks 2008-10-24 19:25 back 2008-10-24 19:26 welcome! :D 2008-10-24 19:27 I'm just making some noise here :P 2008-10-24 19:27 razvanm, don't forget that the page buffers are used not just for once off IO but to cache the physical block address 2008-10-24 19:28 so it's probably best to fully populate the page 2008-10-24 19:28 some filesystems may make assumptions 2008-10-24 19:28 I will populate the page 2008-10-24 19:28 but I don't want to keep around the bhs 2008-10-24 19:28 do the filesystems expect them to be around a lot? 2008-10-24 19:29 what I'm implementing right now is block_read_full_page 2008-10-24 19:29 http://lxr.linux.no/linux+v2.6.26.6/fs/block_dev.c#L71 <- blkbits for the inode for a block device is set here 2008-10-24 19:29 probably bogus 2008-10-24 19:29 I solved that thing :P 2008-10-24 19:29 it was a bug in my alloc_inode :D 2008-10-24 19:31 I wonder what costs more, having an extra indirection in the inode to find the block shift, or bloating up all inodes by an extra word? 2008-10-24 19:31 let me try to formulate a coherent question: is ok if I fill implement block_read_full_page without maintaining the bh for it? 2008-10-24 19:31 I suspect the latter is more costly 2008-10-24 19:31 yes, certainly 2008-10-24 19:32 some filesystems do that 2008-10-24 19:32 yeah! romfs :P 2008-10-24 19:32 though they skip block_read_full_page entirely 2008-10-24 19:32 and just do what has to be done directly 2008-10-24 19:32 which I think is going to be the strategy for tux3 2008-10-24 19:32 great! :D 2008-10-24 19:33 slightly larger code in return for avoiding a huge pile of donkey dung 2008-10-24 19:33 minix_readpage is calling only one function: the block_read_full_page 2008-10-24 19:33 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-24 19:34 changing the title? 2008-10-24 19:34 http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: is ->t_block really necessary? 2008-10-24 19:34 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: is ->t_block really necessary?" 2008-10-24 19:34 attempting to 2008-10-24 19:34 it worked :D 2008-10-24 19:34 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: is ->get_block really necessary?" 2008-10-24 19:35 -!- ChanServ changed mode/#tux3 -> -o flips 2008-10-24 19:35 I was about to ask who is t_block :P 2008-10-24 19:35 if the FS is implementing the readpage directly then there should be no need for get_block, right? 2008-10-24 19:36 at least on the read part 2008-10-24 19:36 (the only one I explored so far) 2008-10-24 19:40 ->get_block is called in lots of places besides readpage 2008-10-24 19:42 those places should be some sort of generic code that tries to simplify the FS, right? 2008-10-24 19:43 by systematically avoiding the generic code the need for get_block should go away 2008-10-24 19:43 not using the generic stuff would also help when porting to another OS :P 2008-10-24 20:05 great! my first call to get_block from minix really worked :D 2008-10-24 20:05 time to go home 2008-10-24 20:21 hey flips 2008-10-24 20:21 hi 2008-10-24 20:21 one more week until the big cabal 2008-10-24 20:21 how's it going with atomic commits ? 2008-10-24 20:21 thursday next 2008-10-24 20:22 on the 30th ? 2008-10-24 20:22 following the thread with matt dillon? 2008-10-24 20:22 yes 2008-10-24 20:22 no, ust he initial post 2008-10-24 20:23 flips: can't find an email link on your new home page 2008-10-24 20:23 you might like to fix that or point out that I'm wrong 2008-10-24 20:24 if it's hard to find it needs to be fixed 2008-10-24 20:24 there is is 2008-10-24 20:24 it is 2008-10-24 20:24 should also link it from the front page, agreed 2008-10-24 20:25 yeah, that would be best, having difficulty finding it 2008-10-24 20:28 do you have a link to the discussion btw ? 2008-10-24 20:29 http://tux3.org/pipermail/tux3/ 2008-10-24 20:31 flips: matt is a great technical ally btw 2008-10-24 20:31 bh, check it out now 2008-10-24 20:31 that's putting it mildly 2008-10-24 20:31 imo, he's the best overall kernel engineer in all of the BSDs minus the original BSD/OS folks 2008-10-24 20:32 matt by the way is responsible for inspiring the linux 2.6 vm design 2008-10-24 20:32 flips: still don't see the link yet 2008-10-24 20:32 flips: yeah, I was behind the scenes in those dats 2008-10-24 20:32 days 2008-10-24 20:32 I saw the entire thing traspire 2008-10-24 20:32 transpire 2008-10-24 20:32 I'll be back later 2008-10-24 20:32 ok 2008-10-24 20:33 try shift-reload 2008-10-24 20:35 still nothing different 2008-10-24 20:42 what browser? 2008-10-24 20:42 something about how it handles frames? 2008-10-24 21:43 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-24 21:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-24 22:16 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-25 00:54 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-25 02:09 -!- Kirantpatil(~kiran@122.167.205.69) has joined #tux3 2008-10-25 02:09 -!- Kirantpatil(~kiran@122.167.205.69) has left #tux3 2008-10-25 02:24 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-10-25 02:34 -!- pgquiles(~pgquiles@156.Red-88-25-133.staticIP.rima-tde.net) has joined #tux3 2008-10-25 08:56 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-25 10:20 -!- nataliep(~nataliep@cpe-76-170-3-242.socal.res.rr.com) has joined #tux3 2008-10-25 10:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-25 14:50 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-25 16:39 -!- pgquiles(~pgquiles@29.Red-81-33-102.dynamicIP.rima-tde.net) has joined #tux3 2008-10-25 17:41 why does the minix_readdir always calls the filler with DT_UNKOWN? http://lxr.linux.no/linux+v2.6.26/fs/minix/dir.c#L135 2008-10-25 18:27 flks 2008-10-25 18:31 flks? 2008-10-25 18:38 aaaa I should be able to find if it's a directory using the inode->i_mode... 2008-10-25 19:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-25 19:20 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-25 20:28 ACTION is happy. He managed to read a whole minixfs. :P 2008-10-25 23:17 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 00:26 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-26 00:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 05:16 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-26 05:23 -!- MaZe1(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-26 08:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 12:00 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-10-26 13:03 -!- pgquiles(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-26 13:10 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-26 13:20 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-26 15:13 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-10-26 16:36 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 18:45 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 19:10 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-26 21:19 -!- Kirantpatil(~kiran@122.167.176.141) has joined #tux3 2008-10-26 21:19 -!- Kirantpatil(~kiran@122.167.176.141) has left #tux3 2008-10-26 22:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-26 22:30 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-27 02:58 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-27 03:16 -!- pgquiles__(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-27 04:27 -!- pgquiles(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-27 05:04 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-27 05:47 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-27 06:47 -!- FelipeS(~Felipe@lawn-128-61-122-225.lawn.gatech.edu) has joined #tux3 2008-10-27 07:06 -!- FelipeS(~Felipe@lawn-128-61-122-225.lawn.gatech.edu) has joined #tux3 2008-10-27 07:25 -!- FelipeS_(~Felipe@lawn-128-61-122-225.lawn.gatech.edu) has joined #tux3 2008-10-27 07:33 -!- marcin(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-10-27 08:03 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-27 08:04 good morning mingming 2008-10-27 08:04 flips, good morning:) 2008-10-27 08:05 did you see the benchmarks posted to btrfs mailing list? 2008-10-27 08:05 shows ext4 doing rather well 2008-10-27 08:07 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-27 08:08 for example: http://btrfs.boxacle.net/repository/single-disk/Initial-compare/Initial-Compare-Single_disk_Mail_server_simulation._num_threads=1.html 2008-10-27 08:08 morning tim_dimm 2008-10-27 08:08 morning flips 2008-10-27 08:09 yo 2008-10-27 09:44 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-27 10:15 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-27 10:45 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-27 11:02 -!- FelipeS(~Felipe@lawn-128-61-30-224.lawn.gatech.edu) has joined #tux3 2008-10-27 13:18 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-27 13:57 hey 2008-10-27 13:57 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-27 14:22 -!- FelipeS(~Felipe@lawn-128-61-126-196.lawn.gatech.edu) has joined #tux3 2008-10-27 16:35 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-27 17:38 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-27 19:29 quiet around here recently 2008-10-27 19:34 what happens when I'm working the swing shift 2008-10-27 19:45 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-27 19:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-27 20:00 shapor, ping 2008-10-27 20:03 here's a good place to start reading today: http://lxr.linux.no/linux+v2.6.27/+code=vfs_rename 2008-10-27 22:03 hey flips 2008-10-27 22:08 hi bh 2008-10-27 22:16 -!- RazvanM(~RazvanM@pool-151-196-13-39.balt.east.verizon.net) has joined #tux3 2008-10-27 22:26 how's it going ? 2008-10-27 23:56 -!- ajonat(~ajonat@190.48.124.28) has joined #tux3 2008-10-28 00:12 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-28 03:22 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-28 04:27 -!- pgquiles(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-28 09:10 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-28 09:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-28 09:47 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-28 11:03 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-28 11:15 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-28 12:02 -!- mingming_(~mingming@32.97.110.55) has joined #tux3 2008-10-28 12:48 -!- pgquiles(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-28 14:37 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-28 14:39 -!- pgquiles_(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-28 15:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-28 15:39 -!- FelipeS(~Felipe@lawn-128-61-20-25.lawn.gatech.edu) has joined #tux3 2008-10-28 19:22 -!- RazvanM(~razvan@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-28 19:23 ACTION is macbookless this time :| 2008-10-28 19:39 hi 2008-10-28 19:43 -!- RalucaM(~ral@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-28 19:43 hi 2008-10-28 19:43 hi RalucaM 2008-10-28 19:49 folks 2008-10-28 19:50 hi 2008-10-28 19:56 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-28 20:08 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-28 20:09 miss anything? 2008-10-28 20:09 (my laptop just crashed on me twice in a row, I'm guessing because of a bad power supply... 2008-10-28 20:09 nope 2008-10-28 20:09 is today's class canceled? 2008-10-28 20:10 I would think not.... but maybe? 2008-10-28 20:10 flips: ping 2008-10-28 20:10 :) 2008-10-28 20:11 although to be fair with the latest round of kernel+nvidia+madwifi upgrades the machine seems much less stable then it was in the past... so maybe it wasn't the power supply just bad luck. Still twice in 10 minutes really takes the cake (the running tally is around thrice in the previous two weeks) 2008-10-28 20:12 and before that it was roughly once every 2-3 weeks 2008-10-28 20:12 oh 2008-10-28 20:12 what linux are you using? 2008-10-28 20:13 debian was always quite stable for me :P 2008-10-28 20:13 fc9 with kernel from koji, newest nvidia, madwifi from svn head 2008-10-28 20:14 I see 2008-10-28 20:14 it used to only lock up occasionally (rarely) during suspend to ram 2008-10-28 20:14 but now it's locked up a few times while I was typing on it... which is new 2008-10-28 20:14 maybe I'll leave it in memtest86 overnight 2008-10-28 20:14 hm... maybe it's a hw issue... 2008-10-28 20:15 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-28 20:15 well, the wireless driver is binary hal + opensource code, and it _is_ buggy 2008-10-28 20:15 2.6.27 supposedly fixes that 2008-10-28 20:15 with the ath9k driver 2008-10-28 20:15 but I'm still on 2.6.26.7 2008-10-28 20:16 I just wish I got kernel crashdumps or something out of this 2008-10-28 20:17 I should probably figure out how to use the kdump kernel stuff 2008-10-28 20:17 :) 2008-10-28 20:18 IIRC, /etc/sysconfig/kdump 2008-10-28 20:18 file not present 2008-10-28 20:18 probably have to install something first 2008-10-28 20:18 found an fc6 wiki 2008-10-28 20:35 blame timothy 2008-10-28 20:36 hi 2008-10-28 20:36 ok, shall we start a little late? 2008-10-28 20:37 folks? 2008-10-28 20:37 ah it is ;-) 2008-10-28 20:37 ponk 2008-10-28 20:37 ok 2008-10-28 20:37 works for me 2008-10-28 20:37 later works for me too :D 2008-10-28 20:38 and here I was about ready to get synergy between laptop and home projector working 2008-10-28 20:38 so what would we rather look at: 1) the get_block question above 2) mysteries of rename? 2008-10-28 20:38 rename 2008-10-28 20:39 rename it is 2008-10-28 20:40 hey flips 2008-10-28 20:40 let's go find where it happens 2008-10-28 20:40 fs/namei.c 2008-10-28 20:40 maybe tux3 rock the world ;) 2008-10-28 20:41 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2582 2008-10-28 20:41 beat me ;) 2008-10-28 20:41 by 2 seconds 2008-10-28 20:41 or sys_rename 2008-10-28 20:41 most of this is permission checking and locking 2008-10-28 20:42 may_create, may_delete, just checking perms 2008-10-28 20:43 http://lxr.linux.no/linux+v2.6.26.6/fs/namei.c#L2671 vfs_rename_other - everything but directories 2008-10-28 20:43 i_mutex is the synchronizer 2008-10-28 20:44 what's last_type ? 2008-10-28 20:44 type of the last segment in the path, i.e., the file 2008-10-28 20:45 oops, I missed a huge item in vfs_rename 2008-10-28 20:45 lock_rename 2008-10-28 20:45 I'm still parsing sys_renameat ;-) 2008-10-28 20:46 oh you started at the syscall 2008-10-28 20:46 not much happening there 2008-10-28 20:46 I'm wondering what that mutex_lock_nested is 2008-10-28 20:46 new one for me, let's see how far back it goes 2008-10-28 20:47 lockdep stuff 2008-10-28 20:47 it's just mutex_lock without lockdep warning 2008-10-28 20:47 ah, so it's just a mutex 2008-10-28 20:47 ok 2008-10-28 20:47 the dentry gets created in it 2008-10-28 20:48 don't think so 2008-10-28 20:49 I should have started at do_rename actually 2008-10-28 20:49 we are in do_rename now? 2008-10-28 20:49 the first big event is the path_lookup 2008-10-28 20:49 ok 2008-10-28 20:49 yes, we went back 2008-10-28 20:49 to see where the dentries come from 2008-10-28 20:49 ugh, lock_rename 2008-10-28 20:49 I jumped to the run stuff first ;) 2008-10-28 20:50 ok, in linux whenever you have an open file you have a dentry for it 2008-10-28 20:50 right, but on rename the dest might not exist 2008-10-28 20:50 'might' ;-) 2008-10-28 20:50 so the vfs can return the dentry as a handle for a directory/file 2008-10-28 20:50 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2625 2008-10-28 20:51 right, a dentry can exist without the underlying object existing 2008-10-28 20:51 #2679 2008-10-28 20:51 shall we go into path_lookup? 2008-10-28 20:51 buckle up your complexity belt 2008-10-28 20:51 http://lxr.linux.no/linux+v2.6.26.6/fs/namei.c#L1133 2008-10-28 20:52 you mean lookup_hash ? or something else 2008-10-28 20:52 oh, 2.6.26 2008-10-28 20:52 ah, sorry 2008-10-28 20:52 I'll move to 2.6.27 2008-10-28 20:52 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L1045 2008-10-28 20:53 like all the path functions, it eventually calls path_walk 2008-10-28 20:53 this code 2008-10-28 20:53 is totally different in 2.6.27 2008-10-28 20:53 refactored at least 2008-10-28 20:53 really? 2008-10-28 20:54 sys_renameat is huge 2008-10-28 20:54 there's no do_rename 2008-10-28 20:55 true 2008-10-28 20:55 it was killed by Al's cleanup 2008-10-28 20:55 yeah, it needed it 2008-10-28 20:56 ok, let's pick a version 2008-10-28 20:56 do_rename was close to renameat 2008-10-28 20:56 cause that's probably why this didn't make a lot of sense to me ;-) 2008-10-28 20:56 so it's been made renameat 2008-10-28 20:56 well we started a level below 2008-10-28 20:56 ok where were we 2008-10-28 20:57 ah, and there are new names for the walk functions 2008-10-28 20:57 lookup_hash 2008-10-28 20:57 do_path_lookup 2008-10-28 20:57 sys_renameat -> user_path_lookup -> do_path_lookup? 2008-10-28 20:57 where do you see lookup_hash? 2008-10-28 20:57 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2679 2008-10-28 20:58 hirofumi, yes 2008-10-28 20:58 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-28 20:58 ok, more refactoring 2008-10-28 20:59 this was previously done within the walk I suppose 2008-10-28 21:00 nameidata 2008-10-28 21:00 the struct that acts as scratchpad for this family of functions 2008-10-28 21:01 in some cases is abused as a substitute for function parameters 2008-10-28 21:01 anyway, the new factoring returns a nameidata instead of dentry 2008-10-28 21:01 and the lookup_hash converts the name in the nameidata to a dentry 2008-10-28 21:02 let's check out nameidata 2008-10-28 21:02 :-) 2008-10-28 21:03 this must be partially for inotify 2008-10-28 21:03 it was around long before inotify was 2008-10-28 21:03 http://lxr.linux.no/linux+v2.6.27/include/linux/namei.h#L18 2008-10-28 21:04 saved_names... never looked at that 2008-10-28 21:04 L27 though 2008-10-28 21:04 probably proc support 2008-10-28 21:04 probably is how symlink traversal was made nonrecursive 2008-10-28 21:05 so you can get meaningful dumps of fd's 2008-10-28 21:05 you get that by following parent links in dentries 2008-10-28 21:06 the "path" in the nameidata... a little oddly named 2008-10-28 21:06 http://lxr.linux.no/linux+v2.6.27/include/linux/path.h#L7 2008-10-28 21:06 it's a dentry/mount pair 2008-10-28 21:07 not immediately obviously what the mount part is for 2008-10-28 21:07 anyway, I wanted to do rename this time 2008-10-28 21:07 not path walk, which I need to review first 2008-10-28 21:08 it keeps changing and it was complex to begin with 2008-10-28 21:08 (side note) an auto-parser which would figure out and auto-annotate code, with a comment, containing where it gets called from and what it calls, what locks are held before and after and during would be useful.... 2008-10-28 21:08 have it ready by friday? 2008-10-28 21:08 :D 2008-10-28 21:09 ;-) 2008-10-28 21:09 anyway, suffice to say for now that path_walk scans off each segment of the / separated path and looks for a dentry 2008-10-28 21:09 if it doesn't find one, it calls the filesystem 2008-10-28 21:10 inode->i_op->lookup 2008-10-28 21:10 if the filesystem says its a symlink, which yields a new path 2008-10-28 21:10 the details a wickely complex 2008-10-28 21:10 for various reasons 2008-10-28 21:10 including nfs 2008-10-28 21:10 enough on path_walk for now 2008-10-28 21:10 yes 2008-10-28 21:11 let's get back to the rename locking 2008-10-28 21:12 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2657 2008-10-28 21:13 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L1462 <- lock)_rename 2008-10-28 21:13 pretty simple 2008-10-28 21:13 needs to lock in the order parent, child 2008-10-28 21:14 to avoid deadlock 2008-10-28 21:14 per fs sb lock 2008-10-28 21:14 yes that too 2008-10-28 21:15 so we have the per-sb lock, and we will take a lock on each directly 2008-10-28 21:15 directory 2008-10-28 21:15 first, walk the chain of dentries up from one directory to the root, looking for the second directory 2008-10-28 21:15 if we find it, we know the second is ancestor of the first, so take the second lock first 2008-10-28 21:16 otherwise, do the same in the other direction 2008-10-28 21:16 if neither is ancestor of the other, the order doesn't matter 2008-10-28 21:16 why do we still have parent/child locks if there's no parent/child relation? 2008-10-28 21:16 though the _nested syntax doesn't make that obvious 2008-10-28 21:17 the answer to that will be found by looking at mutex_lock_nested 2008-10-28 21:18 the int subclass is only advisory 2008-10-28 21:18 oh the parameter is merely for run-time lock checking 2008-10-28 21:18 for automatically checking lock dependencies 2008-10-28 21:18 I suppose there should be a "no dependency" value, don't know why there isn't 2008-10-28 21:19 I think the PARENT CHILD are just arbitrary strings 2008-10-28 21:19 which usually happen to actually match what the locks labeled with them are used for 2008-10-28 21:19 NORMAL, PARENT, CHILD, XATTR, QUOTA 2008-10-28 21:20 # define mutex_acquire(l, s, t, i) lock_acquire(l, s, t, 0, 2, NULL, i) <- ooh, ugly 2008-10-28 21:21 I'm having a hard time believing that all that debugging stuff doesn't create runtime overhead when not used 2008-10-28 21:21 does mutex_lock sleep till it gets the lock? 2008-10-28 21:21 yes 2008-10-28 21:21 I'd guess it can get compiled out 2008-10-28 21:21 via the preprocessor or compiler opt 2008-10-28 21:22 I'll take that on faith 2008-10-28 21:22 I don't see the compiler removing all the extra parameters 2008-10-28 21:22 like the subclass 2008-10-28 21:22 anyway, I think I understand (un)lock_rename 2008-10-28 21:22 right 2008-10-28 21:22 we can look at lockdep another time 2008-10-28 21:23 useful facility that I have never used 2008-10-28 21:23 ACTION doesn't make locking errors 2008-10-28 21:23 usually 2008-10-28 21:24 ok, a couple more reality checks then into vfs_rename 2008-10-28 21:24 locking is just a matter of good design ;-) 2008-10-28 21:24 which is much easier when you're writing your own code from scratch 2008-10-28 21:25 I left off the smilely above 2008-10-28 21:25 everybody makes locking errors 2008-10-28 21:25 (since half the problem is knowing what the design is) 2008-10-28 21:26 vfs_rename is all just permission checks 2008-10-28 21:26 as we saw before 2008-10-28 21:26 the rename_other vs rename_directory 2008-10-28 21:26 why don't we abort here on trap != NULL ? 2008-10-28 21:26 back at 2657. 2008-10-28 21:26 oh nevermind 2008-10-28 21:27 we're still dealing with the dirs the stuff we're renaming are in, not the stuff we're renaming itself 2008-10-28 21:27 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2567 <- we need to get yet another lock here 2008-10-28 21:27 the lock on a filesystem object we are about to implcitly unlink 2008-10-28 21:28 shouldn't this be a mutex_lock_nested call? 2008-10-28 21:28 doesn't need to be order with respect to source or dest dirs 2008-10-28 21:28 ordered 2008-10-28 21:29 on with respect to the sb rename mutex 2008-10-28 21:29 only 2008-10-28 21:30 anyway, we call the fs ->rename method and that's about all there is to that 2008-10-28 21:30 the fs can happiily work away knowing that all the needed locks are already taken 2008-10-28 21:31 oh can imagine bottlenecks here with mass renames 2008-10-28 21:31 but since one doesn't really see mass renames, it's not a big issue 2008-10-28 21:31 now rename_dir 2008-10-28 21:32 oh, d_move? 2008-10-28 21:32 and renames within the same directory don't use the per-fs-sb lock 2008-10-28 21:32 ah right 2008-10-28 21:32 the dentry cache has to be updated to reflect what the filesystem did to the backing store 2008-10-28 21:33 good observation 2008-10-28 21:34 _other and _dir are almost identical 2008-10-28 21:34 probably should be one function 2008-10-28 21:36 the new_dentry is preemptively unhashed for some reason 2008-10-28 21:37 something of a mystery why 2008-10-28 21:37 this code really suffers from being nearly devoid of comments 2008-10-28 21:38 rehashed later on 2008-10-28 21:38 without explanation 2008-10-28 21:39 homework 2008-10-28 21:39 maybe, comment of dentry_unhash 2008-10-28 21:39 homework: "wtf is the unhash/rehash in vfs_rename_dir all about?" 2008-10-28 21:39 I think this is how the target gets deleted? 2008-10-28 21:40 http://lxr.linux.no/linux+v2.6.27/fs/namei.c#L2110 2008-10-28 21:40 see rename_other for a target also being unlinked 2008-10-28 21:40 after all a rename will move the source, but kill the destination 2008-10-28 21:40 hence S_DEAD 2008-10-28 21:42 151#define S_DEAD 16 /* removed, but still open directory */ 2008-10-28 21:42 http://lxr.linux.no/linux+v2.6.27/include/linux/fs.h#L151 2008-10-28 21:43 what that's special to directories is not clear 2008-10-28 21:43 regular files can also be removed but still open 2008-10-28 21:43 if S_DEAD, we can't lookup anymore 2008-10-28 21:44 why is it even in the hash then? 2008-10-28 21:44 both good points 2008-10-28 21:45 may be just for d_move? 2008-10-28 21:45 reiserfs refuses to read xattrs for a dead dir 2008-10-28 21:46 ENOENT automatically on readdir 2008-10-28 21:46 S_DEAD isn't used by very many filesystems 2008-10-28 21:46 (that last was vfs) 2008-10-28 21:46 see IS_DEADDIR 2008-10-28 21:47 may_create false in dead dir etc 2008-10-28 21:47 yes 2008-10-28 21:48 can;t create child dentry for dead dir ( in lookup_hash) 2008-10-28 21:48 now, why is it still in the hash? 2008-10-28 21:49 laziness? 2008-10-28 21:50 somebody holds a count on it somehow? 2008-10-28 21:50 seems like a stretch 2008-10-28 21:50 well 2008-10-28 21:51 another day, another crufty bit of linux kernel 2008-10-28 21:52 rename is the ickiest of the vfs namespace functions 2008-10-28 21:52 the others will see clear by comparison 2008-10-28 21:53 hmm 2008-10-28 21:53 this didn't seem that bad 2008-10-28 21:53 MaZe: could be worse, right? :D 2008-10-28 21:53 yup 2008-10-28 21:56 ACTION says thanks for the lesson! 2008-10-28 21:56 ACTION also goes to bed because he wake up very early today. 2008-10-28 21:56 yes, thanks! 2008-10-28 21:56 ok plug in power 2008-10-28 21:57 dentry_unhash in rename seems to be just for strange fs 2008-10-28 21:58 if it cannot handle the case of removing a directory that is still in use by something else.. 2008-10-28 21:59 oh 2008-10-28 22:03 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-28 22:03 lol 2008-10-28 22:12 pushed the wrong button? 2008-10-28 22:13 nope 2008-10-28 22:14 ran out of battery power 2008-10-28 22:14 had to go find a socket and plug yourself in then 2008-10-28 22:14 nope 2008-10-28 22:14 instead of reaching for power cord 2008-10-28 22:14 I started typing 2008-10-28 22:14 and the system critical shut down 2008-10-28 22:15 since apparently the batter went from 30% to 2% in a couple seconds 2008-10-28 22:15 (clean shutdown though) 2008-10-28 23:14 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-28 23:47 are you already starting to implement atomic commit? 2008-10-29 00:01 -!- pgquiles__(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-29 00:12 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-29 00:19 -!- pgquiles_(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-29 01:19 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-29 04:44 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-29 06:15 -!- FelipeS(~Felipe@lawn-128-61-120-139.lawn.gatech.edu) has joined #tux3 2008-10-29 06:15 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-29 06:21 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-29 07:26 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-29 07:50 -!- pgquiles(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-29 08:39 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-29 08:47 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-29 08:56 -!- RzM|Away(~razvan@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-29 10:27 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-29 11:49 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-29 11:57 hirofumi, yes 2008-10-29 11:58 oh, great 2008-10-29 11:58 I was thinking about it last week 2008-10-29 11:58 next commit will add date handling, after that it's all atomic commit work 2008-10-29 11:59 I'm writing a post to clarify a few details at the moment 2008-10-29 11:59 it's a fun thing to think about, first new approach to the problem in 15 years 2008-10-29 12:00 the method needs a name 2008-10-29 12:00 isn't it atomic commit? 2008-10-29 12:00 a new kind of atomic commit 2008-10-29 12:00 the first kind used in filesystems was journalling 2008-10-29 12:01 then came logging and tree-based copy on write 2008-10-29 12:01 recursive copy on write 2008-10-29 12:01 ah, yes. 2008-10-29 12:01 this is non-recursive copy on write 2008-10-29 12:01 but that would be a lame name 2008-10-29 12:02 something to think about over the next couple weeks 2008-10-29 12:02 yes, atomic commit. it seems too generic 2008-10-29 12:03 btw, in rollup, do we need to write out modified btree-index? 2008-10-29 12:05 rollup writes out previously modified btree nodes 2008-10-29 12:05 i see 2008-10-29 12:05 and may at the same time modify more btree nodes, which will be written in a future rollup 2008-10-29 12:06 it will be recursive way to root? 2008-10-29 12:07 it eventually goes to the root, yes, but does not create a new root 2008-10-29 12:07 i see. how do we handle root? 2008-10-29 12:07 so there are two essential differences from recursive tree copy on write: 1) the updates are spread out in time, they don't happen on each leaf write 2) does not generate new trees 2008-10-29 12:08 or rather, does not generate multiple trees 2008-10-29 12:08 we have a few fixed locations for root 2008-10-29 12:08 and a sequence number 2008-10-29 12:08 I need to write that in a design note 2008-10-29 12:09 root is modified very rarely 2008-10-29 12:09 oh, i see. tux3 can merge btree-index modification in multiple phase? 2008-10-29 12:09 generally only when the inode table btree index needs an additional level 2008-10-29 12:10 it can 2008-10-29 12:10 um.. 2008-10-29 12:11 the question of whether the current tree state is represented via promises in commit blocks or actual written out index blocks is orthagonal to the phase mechanism 2008-10-29 12:12 orthagonal? 2008-10-29 12:12 ACTION my english skill is too poor 2008-10-29 12:13 "does not affect" 2008-10-29 12:13 "one can be changed without affecting the other" 2008-10-29 12:14 i see. 2008-10-29 12:14 your english skill is fine, I didn't even notice you're not a native speaker 2008-10-29 12:14 oh, it's surprise to me 2008-10-29 12:15 thanks. 2008-10-29 12:16 i'm still thinking about rollup stage... 2008-10-29 12:16 in "Cache state reconstruction" 2008-10-29 12:16 section 2008-10-29 12:17 it says parent blocks of rolled up, will via promises recorded 2008-10-29 12:19 it means parent block will be copy-on-write, then it will be written to new location as new block? 2008-10-29 12:22 yes 2008-10-29 12:22 i see 2008-10-29 12:23 in fact, there is not a copy on wirte 2008-10-29 12:23 because the buffer is in cache 2008-10-29 12:23 ah 2008-10-29 12:23 the block buffer is simply assigned to a new location 2008-10-29 12:23 and the new location becomes a promise 2008-10-29 12:24 actual copy on write of buffers does happen, but it is for a different purpose: to prevent stalls in writing by userspace programs 2008-10-29 12:25 um..., but new one is block on stable image + previous promise 2008-10-29 12:25 yes 2008-10-29 12:26 if I think stable image is original, and new one can be called copy-on-write? 2008-10-29 12:26 except no copy is done 2008-10-29 12:26 so without a copy, it isn't copy on write 2008-10-29 12:27 a better term is redirect on write 2008-10-29 12:27 now that I think of it, copy on write is incorrect terminology for the algorithm used by btrfs 2008-10-29 12:27 well, let me think about that 2008-10-29 12:28 i see 2008-10-29 12:28 depends how they actually implement it 2008-10-29 12:29 physical remmapping is done in buffer cache? 2008-10-29 12:30 yes 2008-10-29 12:30 during normal operation what we do is make the normal modification to the cached image of index block just as it is implemented now, and at the same time, write a promise to modify the physical block into a commit block 2008-10-29 12:30 rollup does not apply promises, because they are already applied 2008-10-29 12:30 only recover does 2008-10-29 12:30 only recovery does 2008-10-29 12:31 however, rollup optimize(?) promises? 2008-10-29 12:32 i mean it will rewrite/merge dirty index blocks 2008-10-29 12:32 rollup writes out the dirty index block, making the promises no longer necessary, so they can be discarded 2008-10-29 12:32 yes 2008-10-29 12:32 i see 2008-10-29 12:32 what I realized a couple days ago is that promises can be retired out of order 2008-10-29 12:33 and so we need a way to know which promises don't need to be applied any more, because the index block they refer to was already written out 2008-10-29 12:34 um... 2008-10-29 12:34 we don't know in advance what order the index blocks will be written out, because it depends on the pattern of filesystem activity 2008-10-29 12:34 it doesn't have dependency? 2008-10-29 12:34 dependency on what? 2008-10-29 12:35 e.g. previous phase may have parent directory of current phase? 2008-10-29 12:35 did you mean the word "directory" ? 2008-10-29 12:36 directory entry 2008-10-29 12:36 a changed directory entry must be written out in the same phase as the changed inode table block 2008-10-29 12:37 that is a fule that guarantees atomicity 2008-10-29 12:37 a rule 2008-10-29 12:37 we don't actually analyze those dependencies 2008-10-29 12:37 um.. 2008-10-29 12:38 but instead, just see what buffers the filesystem operation changes 2008-10-29 12:38 and add those changed buffers to the current phase 2008-10-29 12:38 previous phase has "foo", and next one has "foo/bar"? 2008-10-29 12:39 there can be a commit between creating foo and foo/bar, that is ok 2008-10-29 12:39 yes 2008-10-29 12:40 but reverse order of phase, bar is orphaned entry? 2008-10-29 12:40 but that can't happen because the phases cannot be completed out of order 2008-10-29 12:41 ah, maybe i missread "retired out of order" 2008-10-29 12:42 right, it's just the promises that can be retired out of order 2008-10-29 12:42 i see 2008-10-29 12:42 the order in which promises can be retired (that is, ignored on recovery) depends on the order in which we add dirty index blocks to the active phase 2008-10-29 12:43 that is a key point I need to mention: we do not normally add dirty index blocks to the active phase 2008-10-29 12:45 umm.. hard to understand yet for me unfortunately 2008-10-29 12:45 but I think you are the closest to understanding 2008-10-29 12:46 thanks, I hope 2008-10-29 12:46 so 2008-10-29 12:46 the reason we don't add dirty index blocks to the active phase is, that would defeat the optimization we do with the promises 2008-10-29 12:46 so instead, we only add them on split or rollup 2008-10-29 12:48 um.. what means "don't add"? 2008-10-29 12:48 we don't dirty those? 2008-10-29 12:48 each phase has a list of buffers that belong to it, and will be written to disk in that phase 2008-10-29 12:48 ah 2008-10-29 12:49 delay? 2008-10-29 12:49 which delay? 2008-10-29 12:49 delay to add dirty index blocks? 2008-10-29 12:50 there are dirty blocks not added to a phase, these are the blocks that need to be reconstructed from promises on recovery 2008-10-29 12:50 i see 2008-10-29 12:51 speaking of delay... a phase cannot begin to be written to disk until the next phase has started 2008-10-29 12:52 however, it can be by timeout? 2008-10-29 12:52 ah, yes 2008-10-29 12:52 I think a better trigger is, write queue on the underlying device nearly empty 2008-10-29 12:53 oh, i see 2008-10-29 12:53 so that when the device is not doing anything, at the start of an untar for example, the first phase will be very short 2008-10-29 12:54 sounds very good 2008-10-29 12:54 yes, good for throughput 2008-10-29 12:54 yes 2008-10-29 12:55 btw, in "Phase transition" section 2008-10-29 12:55 Starting a new phase requires incrementing the phase counter in the 2008-10-29 12:55 cached filesystem superblock and flushing all dirty inodes. 2008-10-29 12:55 in this section, "flushing all dirty inodes" means ->write_inode on linux 2008-10-29 12:55 ? 2008-10-29 12:56 but I meant also flushing any dirty blocks (pages in kernel) cached by the inode 2008-10-29 12:57 i see. actual write out... 2008-10-29 12:57 yes, I should have been more specific 2008-10-29 12:58 in starting a new phase, we need to flush dirty buffers? 2008-10-29 12:59 to start a new phase 2008-10-29 12:59 flush the buffers dirtied in the previous phase 2008-10-29 12:59 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-29 13:00 well, flush the _data_ buffers dirtied in the previous phase 2008-10-29 13:00 it's better to say, flush the dirty inode blocks 2008-10-29 13:00 ah, i see 2008-10-29 13:00 ordered-write mode? 2008-10-29 13:01 this is more strict than ordered-write 2008-10-29 13:01 because we write the data blocks to new locations that don't overwrite data in a previous phase 2008-10-29 13:01 it's like data=journal 2008-10-29 13:02 has the same effect, but without writing twice 2008-10-29 13:02 i see 2008-10-29 13:03 I thought data buffers may be in place update 2008-10-29 13:05 thanks. I belive my understanding became more good 2008-10-29 13:09 we might add in place update later as an additional mode 2008-10-29 13:09 like ordered data 2008-10-29 13:11 the advantage is not in terms of speed, because in both cases we write the new blocks only once, but in reducing fragmentation because the data does not have to be relocated 2008-10-29 13:11 yes 2008-10-29 13:12 for a solid state disk, the advantage is very little 2008-10-29 13:12 so we will always want our strict mode for ssd I think 2008-10-29 13:13 ah, yes. it may be important in future 2008-10-29 13:13 I have an ssd now :) 2008-10-29 13:13 my eee 2008-10-29 13:13 oh, too fast :) 2008-10-29 13:14 we don't have good fs for it yet :) 2008-10-29 13:16 btw, are you already thinking about locking rules? 2008-10-29 13:16 yes 2008-10-29 13:16 in some depth 2008-10-29 13:16 oh, great 2008-10-29 13:17 I'm ignore about it for now 2008-10-29 13:17 that's reasonable 2008-10-29 13:17 we can start with a simple lock 2008-10-29 13:18 e.g. per btree big lock? 2008-10-29 13:18 yes 2008-10-29 13:18 i see 2008-10-29 13:18 per inode, the easist thing 2008-10-29 13:19 and one more for modifying the inode table 2008-10-29 13:19 and may be for bitmap? 2008-10-29 13:19 yes 2008-10-29 13:19 allocation lock 2008-10-29 13:19 i see 2008-10-29 13:20 ah 2008-10-29 13:20 in phase transision, we modify bitmap for commit blocks etc.? 2008-10-29 13:20 yes 2008-10-29 13:20 and bitmap change will be written to same phase? 2008-10-29 13:20 commit blocks are marked as allocated to prevent them from being allocated for other purposes 2008-10-29 13:21 good question 2008-10-29 13:21 it's probably most efficient to write it to the same phase 2008-10-29 13:22 well 2008-10-29 13:22 good question :) 2008-10-29 13:22 i see. I thought it may be in phase commit or related blocks 2008-10-29 13:23 there will be multiple commit blocks per phase 2008-10-29 13:23 so phase commit points those blocks? 2008-10-29 13:24 each commit block points to some number of flushed blocks 2008-10-29 13:24 as many as will fit in the commit 2008-10-29 13:25 yes 2008-10-29 13:25 and all the flushed blocks, plus all the commit blocks, have to be completely written before the commit block for the phase is written 2008-10-29 13:25 it might be better to call those multiple commit blocks, log blocks 2008-10-29 13:25 and reserve the term commit block for the phase commit block 2008-10-29 13:26 sounds good 2008-10-29 13:26 I think that the dirty bitmaps for the log blocks have to be in the same phase, yes 2008-10-29 13:26 i see. how about commit block? 2008-10-29 13:27 commit block also allocate new block? 2008-10-29 13:27 yes 2008-10-29 13:28 I don't see a clear reason why it has to be in the allocation map of its own phase, or in the next phase 2008-10-29 13:29 um.. 2008-10-29 13:30 if crashed, we don't now free blocks until trace phases? 2008-10-29 13:30 now -> know 2008-10-29 13:30 we fall back to the last completed phase 2008-10-29 13:30 which means we found the commit block, and we know that it is allocated 2008-10-29 13:31 ah 2008-10-29 13:32 in recovery, we will mark those as allocated? 2008-10-29 13:33 yes 2008-10-29 13:33 part of reconstructing dirty metadata 2008-10-29 13:33 the phase commit block can be freed after the next phase completes 2008-10-29 13:34 one thing we could do in future, is allow more than one phase on disk 2008-10-29 13:34 yes 2008-10-29 13:34 this gives a very limited form of versioning 2008-10-29 13:35 at the expense of making allocation decisions more difficult 2008-10-29 13:35 it might be useful for something 2008-10-29 13:35 i see 2008-10-29 14:22 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-29 19:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-29 20:40 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-29 20:41 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-29 22:11 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 01:19 -!- vcgomes[away](~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-10-30 03:54 -!- pgquiles_(~pgquiles@62.43.226.52.static.user.ono.com) has joined #tux3 2008-10-30 04:01 -!- pgquiles__(~pgquiles@19.Red-83-44-236.dynamicIP.rima-tde.net) has joined #tux3 2008-10-30 07:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 09:29 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-10-30 12:08 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-30 12:15 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 12:56 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-30 12:57 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-30 13:12 flips: cabal is today right ? 2008-10-30 13:50 bh, postponed due to me having a cold 2008-10-30 13:51 ok 2008-10-30 13:51 when ? 2008-10-30 14:09 stay tuned for further developments 2008-10-30 14:11 we having the 8pm u today? 2008-10-30 14:19 ok 2008-10-30 14:19 flips: I was going to drive up Friday possibly so that's why I asked 2008-10-30 14:27 oh, it will be next week 2008-10-30 14:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 14:41 ok 2008-10-30 14:41 I might be in the SF bay area by then 2008-10-30 16:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 16:53 -!- ajonat(~ajonat@190.48.117.217) has joined #tux3 2008-10-30 17:52 -!- ajonat(~ajonat@190.48.124.161) has joined #tux3 2008-10-30 18:33 -!- mlankhorst(~m@fw1.astro.rug.nl) has joined #tux3 2008-10-30 19:50 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-30 20:00 ping... 2008-10-30 20:00 -!- RalucaM(~ral@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-30 20:00 hi 2008-10-30 20:00 ACTION is warming up the lxr 2008-10-30 20:01 ACTION is also very sleepy because he has an uptime of about 17h 2008-10-30 20:02 ACTION looks around... 2008-10-30 20:02 ACTION is not particularly sleepy today because he had a decent maintenance downtime window this time around... 2008-10-30 20:02 flips is not around? 2008-10-30 20:03 ACTION searches around for flips 2008-10-30 20:03 ACTION is also back on macbook :P 2008-10-30 20:03 ACTION me on a macbook as well 2008-10-30 20:03 :D 2008-10-30 20:03 macbook is good... 2008-10-30 20:03 hi 2008-10-30 20:03 ups that wasn't a correct action 2008-10-30 20:03 hey 2008-10-30 20:03 on the other hand I did some progress on PPC while the macbook was away ;-) 2008-10-30 20:03 RazvanM: running Mac or linux on the macbook? 2008-10-30 20:03 MaZe: Mac OS 2008-10-30 20:04 ok, let's look at the block io library today, a little more 2008-10-30 20:04 ah, well, I'm running linux, I like the hardware, but couldn't get used to the OS and switched to running progressing versions of fedora 2008-10-30 20:04 see what's there, and why we call it the library 2008-10-30 20:05 there are actually two layers of block library functions 2008-10-30 20:05 the top layer being the generic_* routines 2008-10-30 20:05 let's start with the which linux ver question 2008-10-30 20:05 2.6.27 2008-10-30 20:05 I slipped last time 2008-10-30 20:06 the generic_* are all called through hooks, which we have seen 2008-10-30 20:06 of the form "if this hook (e.g., ->write) is nonzero then call the function supplied by the filesystem" 2008-10-30 20:07 otherwise the vfs calls a generic function 2008-10-30 20:07 grep generic * | grep EXPORT | wc -l 2008-10-30 20:07 26 2008-10-30 20:07 all in buffer.c and filemap.c? 2008-10-30 20:07 (in fs/) 2008-10-30 20:07 why not have the hooks point to the default and not have to the the if check every time? 2008-10-30 20:08 because we're lame 2008-10-30 20:08 grep generic * | grep EXPORT | cut -d ':' -f 1 | sort | uniq | xargs 2008-10-30 20:08 buffer.c fs-writeback.c inode.c libfs.c locks.c namei.c namespace.c open.c read_write.c splice.c stat.c super.c xattr.c 2008-10-30 20:08 changing it would require a big spam edit to dozens of filesystems, if you can show a benefit such a patch is sometimes accepted 2008-10-30 20:08 50/50 chance, assuming it actually deletes code or makes something more efficient 2008-10-30 20:09 razvanm, so generics are splattered all over the place 2008-10-30 20:09 in keeping with how they came to be: they all started life as specific code used by some filesystem, typically ext2 2008-10-30 20:10 really it would be better if they were all in libfs.c 2008-10-30 20:10 I wonder what's in locks.c 2008-10-30 20:11 ACTION looks 2008-10-30 20:11 libfs.c:EXPORT_SYMBOL_GPL(generic_fh_to_dentry); 2008-10-30 20:11 libfs.c:EXPORT_SYMBOL_GPL(generic_fh_to_parent); 2008-10-30 20:11 libfs.c:EXPORT_SYMBOL(generic_read_dir); 2008-10-30 20:11 locks.c:EXPORT_SYMBOL(generic_setlease); 2008-10-30 20:11 only one is in locks.c 2008-10-30 20:11 libfs.c would appear to be half an idea 2008-10-30 20:12 what do you mean - half an idea? 2008-10-30 20:12 having only those three minor generics in it doesn't fit the name well 2008-10-30 20:12 fh_to dentry is an nfs function by the way 2008-10-30 20:13 (beside the 26 in fs/ there are 10 more in mm/; 9 in mm/filemap.c and one in mm/page-writeback.c) 2008-10-30 20:13 I suppose we could look at how nfs works one session some time down the road 2008-10-30 20:13 ok, I see 2008-10-30 20:13 that will be scary 2008-10-30 20:13 fh - filehandle? 2008-10-30 20:13 so basically nfs cookie? 2008-10-30 20:13 yes 2008-10-30 20:14 we say opaque filehandle and reserve cookie to mean directory cookie 2008-10-30 20:14 how large is it? 2008-10-30 20:14 it? 2008-10-30 20:15 a fh? 2008-10-30 20:15 yup 2008-10-30 20:15 lots of bytes 2008-10-30 20:15 64 I seem to recall 2008-10-30 20:15 don't quote me 2008-10-30 20:16 one way to find out is to look at the slab cache 2008-10-30 20:16 cat /proc/slabinfo 2008-10-30 20:16 another is to printk(... sizeof(struct fh)); 2008-10-30 20:16 in junkfs 2008-10-30 20:17 ok, we're not going to replace the top layer of fs library functions 2008-10-30 20:17 5*4 bytes it would seem (at least) 2008-10-30 20:17 assuming struct fid is what it is 2008-10-30 20:18 it's defined by the rfs of course 2008-10-30 20:18 rfc 2008-10-30 20:18 v2 fh is 32 bytes 2008-10-30 20:18 v3 is variable up to 64 2008-10-30 20:19 so I recalled sort of correctly 2008-10-30 20:19 anyway 2008-10-30 20:19 ;-) 2008-10-30 20:19 nfs is a whole huge messy topic 2008-10-30 20:19 and is interesting to tux3 only to the extent that we have to do a few things to make that mess work 2008-10-30 20:19 what the fs needs to obey in order to support nfs exporting is probably worth going over 2008-10-30 20:19 doesn't nfs work with any fs? 2008-10-30 20:19 oh, and is interesting to tux3 because one of the prime uses of tux3 will be exporting nfs 2008-10-30 20:20 the fs has to obey some rules in order to support nfs 2008-10-30 20:20 maze, yes it would be worth a homework assignment: "all the places nfs causes pain for a regular fs" 2008-10-30 20:20 isn't that more like a dissertation? 2008-10-30 20:21 for example, there has to be a way to supply stable directory cookies across reboots that fit in 31 bits to support v2 2008-10-30 20:21 that relaxes to 64 bits in v3 2008-10-30 20:21 still painful 2008-10-30 20:21 maze, not really, it's mostly googling 2008-10-30 20:22 there are only a few places filesystems have to do something bizarre and unnatural 2008-10-30 20:22 the directory one is the one I'm directly familiar with because of the pain it caused in htree development 2008-10-30 20:23 btrfs guys are currently going through equivalent pain 2008-10-30 20:23 trying to get their directory scheme to work with nfs 2008-10-30 20:23 reiser never played well with nfs 2008-10-30 20:23 ok 2008-10-30 20:23 hi 2008-10-30 20:23 question: nfs2, is it still widely used? 2008-10-30 20:23 hi hirofumi 2008-10-30 20:24 (seeing as nfs4 is long out...) 2008-10-30 20:24 maze, I haven't seen one for a long time 2008-10-30 20:24 i wake up now 2008-10-30 20:24 and a brand new filesystem could possibly ignore it 2008-10-30 20:24 but since we know how to support it properly, why not? 2008-10-30 20:25 well, 31 vs 64 bits is a bit off a difference 2008-10-30 20:25 the directory index planned for tux3 doesn't care 2008-10-30 20:25 the original htree would have benefitted 2008-10-30 20:26 ok, we could spend one session on just that: how ext3 dirops handle nfs v2/f3 2008-10-30 20:26 it's rather complex 2008-10-30 20:27 now, let's move on down to the second layer of fs library calls in the read/write path 2008-10-30 20:27 when we looked at generic read/write, we saw that the filesystem does all its work in the ->readpage/->writepage calls 2008-10-30 20:27 at least in the non-mpage forms 2008-10-30 20:28 today, more of the work, by volume, gets done in the unspeakably messy but faster mpage stuff 2008-10-30 20:28 we will look at both 2008-10-30 20:29 but let's consider the _2copy generic function first 2008-10-30 20:29 and follow it into ext3 2008-10-30 20:29 anybody got a url for the ->writepage call in _2copy? 2008-10-30 20:29 ACTION is searching 2008-10-30 20:30 (my standard trick when I need to get up and get my cup of tea for example) 2008-10-30 20:30 generic_perform_write_2copy? 2008-10-30 20:30 uhm it doesn;t? 2008-10-30 20:31 that's the one 2008-10-30 20:31 http://lxr.linux.no/linux+v2.6.27/mm/filemap.c#L2216 2008-10-30 20:31 http://lxr.linux.no/linux+v2.6.27/mm/filemap.c#L2331 2008-10-30 20:32 http://lxr.linux.no/linux+v2.6.27/mm/filemap.c#L2345 ? 2008-10-30 20:32 2312 status = a_ops->prepare_write(file, page, offset, offset+bytes); 2008-10-30 20:32 there's commit_write a little bit down as well 2008-10-30 20:32 2345 status = a_ops->commit_write(file, page, offset, offset+bytes); 2008-10-30 20:32 right 2008-10-30 20:32 it's a two-prong plug 2008-10-30 20:32 for no good reason 2008-10-30 20:32 :-) 2008-10-30 20:33 hmm, not sure 2008-10-30 20:33 maybe it's needed for partial page writes 2008-10-30 20:33 so if all the filesystem does is supply those two functions, then standard buffered write will just magically work 2008-10-30 20:33 let's see how ext2 supplies them 2008-10-30 20:34 maze, no, there's no good reason 2008-10-30 20:34 as attested to by them being scheduled for eradication 2008-10-30 20:34 finally 2008-10-30 20:34 alway were just a messy wart 2008-10-30 20:35 -!- madhu(~chatzilla@122.252.226.161) has joined #tux3 2008-10-30 20:35 hey all 2008-10-30 20:35 hi bobby 2008-10-30 20:35 hirofumi, what time is in in japan? 2008-10-30 20:35 long time no see 2008-10-30 20:35 I think really early in the morning 2008-10-30 20:35 12:35 2008-10-30 20:35 oh 2008-10-30 20:36 its 9:05 in india :) 2008-10-30 20:36 http://lxr.linux.no/linux+v2.6.27/fs/ext2/inode.c#L783 2008-10-30 20:36 ok, that's fine 2008-10-30 20:36 11:36 in baltimore ;-) 2008-10-30 20:36 midnight is the best time to hack 2008-10-30 20:36 that is true! 2008-10-30 20:36 yes 2008-10-30 20:37 ext2 doesn't implement those functions 2008-10-30 20:37 yup ;-) 2008-10-30 20:37 exactly 2008-10-30 20:37 actually most fs don't 2008-10-30 20:37 RazvanM: when is the next tux3 U? 2008-10-30 20:37 because they are using the ->writepage interface instead 2008-10-30 20:37 looks like: cifs afs gfs2 ecryptfs impement prepare_write 2008-10-30 20:38 which does prepare_ ... comit_ 2008-10-30 20:38 ext2 uses ->write_begin and ->write_end 2008-10-30 20:38 bobby: it's right now right here 2008-10-30 20:39 ohk 2008-10-30 20:39 bobby: and flips is the teacher :D 2008-10-30 20:39 and the next one is on tuesday at 8 pm pdt 2008-10-30 20:39 (see topic) 2008-10-30 20:39 hmm, the timings are a bit difficult :( 2008-10-30 20:39 although really the next session should be updated 2008-10-30 20:39 hirofumi, yes, new things 2008-10-30 20:40 the replacement of prepare... commit 2008-10-30 20:40 yes 2008-10-30 20:40 wtill pointlessly a two-prong plug 2008-10-30 20:40 still 2008-10-30 20:40 bobby, the current time seems to work out fine 2008-10-30 20:40 means you have to get up early ;) 2008-10-30 20:40 those are introduced for bug fix 2008-10-30 20:40 flips: yeah :( 2008-10-30 20:41 bug fix? 2008-10-30 20:41 yes 2008-10-30 20:41 hirofumi, got a url? 2008-10-30 20:41 hirofumi: care to elaborate? 2008-10-30 20:41 $ grep prepare_write *.c | grep EXPORT 2008-10-30 20:41 buffer.c:EXPORT_SYMBOL(block_prepare_write); 2008-10-30 20:41 libfs.c:EXPORT_SYMBOL(simple_prepare_write); 2008-10-30 20:41 race condition iirc 2008-10-30 20:42 $ grep commit_write *.c | grep EXPORT 2008-10-30 20:42 buffer.c:EXPORT_SYMBOL(block_commit_write); 2008-10-30 20:44 fs: introduce write_begin, write_end, and perform_write aops 2008-10-30 20:44 2008-10-30 20:44 These are intended to replace prepare_write and commit_write with more 2008-10-30 20:44 flexible alternatives that are also able to avoid the buffered write 2008-10-30 20:44 deadlock problems efficiently (which prepare_write is unable to do). 2008-10-30 20:44 commitid of git is afddba49d18f346e5cc2938b6ed7c512db18ca68 2008-10-30 20:45 so this is how stuff get added without the removing the old ones :P 2008-10-30 20:45 http://lxr.linux.no/linux+v2.6.27/mm/page-writeback.c#L974 <- ok, here is the generic writepage call, as opposed to the prepare/commit interface in _2copy 2008-10-30 20:46 http://lxr.linux.no/linux+v2.6.27/mm/page-writeback.c#L991 <- generic_writepages 2008-10-30 20:46 http://lxr.linux.no/linux+v2.6.27/mm/page-writeback.c#L866 <- write_cache_pages 2008-10-30 20:47 936 ret = (*writepage)(page, wbc, data); 2008-10-30 20:47 so... I retract the claim about _2copy, and now assert that if all you implement is ->writepage, that the vfs will use it via write_cache_pages to implement buffered file write 2008-10-30 20:48 so now let's look at how ext2 implements it 2008-10-30 20:49 http://lxr.linux.no/linux+v2.6.27/fs/buffer.c#L2859 2008-10-30 20:49 http://lxr.linux.no/linux+v2.6.27/fs/ext2/inode.c#L707 <- ext2_writepage 2008-10-30 20:49 way ahead of me ;) 2008-10-30 20:50 so all ext2 does is pass its ext2_get_block back to the vfs block io library 2008-10-30 20:50 we have looked at that call before 2008-10-30 20:50 no need to again right now, correct? 2008-10-30 20:50 and block_write_full_page is part of the vfs block io library 2008-10-30 20:51 right 2008-10-30 20:51 it does prepare... commit 2008-10-30 20:51 maybe now has been changed to begin...end 2008-10-30 20:51 let's see 2008-10-30 20:52 well know 2008-10-30 20:52 it's basically prepare and commit grafted together 2008-10-30 20:52 with a call to get_block sandwiched in between 2008-10-30 20:53 arguably a candidate for conversion to use the new functions 2008-10-30 20:53 let's look at the ext2 part 2008-10-30 20:53 ext2_get_block 2008-10-30 20:53 we've looked at it briefly before, right? 2008-10-30 20:54 http://lxr.linux.no/linux+v2.6.27/fs/ext2/inode.c#L694 2008-10-30 20:55 this is the one that traverses the file index starting at an inode and given a logical offset, to find a physical block number which is returned by filling in a b_blocknr field in a supplied buffer_head 2008-10-30 20:55 there are a few decorations on that interface, such as telling the caller that the block was newly allocated and thus the buffer needs to be zeroed by some callers 2008-10-30 20:56 now, this is where we want to do things differently in tux3 2008-10-30 20:56 how? 2008-10-30 20:57 are we going to have readpage/writepage? 2008-10-30 20:57 instead of calling back into the fs lib from ->block_write_full_page, tux3 will just go on to write out the page 2008-10-30 20:57 by calling submit_bio 2008-10-30 20:57 we will implement ->readpage/->writepage, but they won't call the library routines 2008-10-30 20:58 and we maybe won't have to write a tux3_getblock 2008-10-30 20:58 i see 2008-10-30 20:58 that's the interesting question I wanted to address today: can we get away with no tux3_getblock at all, but instead just initiate io ourselves where the vfs calls ->writepage and similar 2008-10-30 20:59 we'll not have a tux3_getblock at all or it will be exposed to the outside of tux3? 2008-10-30 20:59 I'd like to have none at all, though we need something similar to implement bmap 2008-10-30 21:00 (midnight) 2008-10-30 21:00 however that doesn't have to implement all the slightly wierd semantics of typical ->get_block 2008-10-30 21:00 razvanm, did you turn into a pumpkin? 2008-10-30 21:00 getblock is pretty weird, because what it could/should return may depend on the future and whether we want to read or write 2008-10-30 21:00 and where's raluca? 2008-10-30 21:00 :D 2008-10-30 21:00 she's here :P 2008-10-30 21:01 actually, we have a small pumpkin made of plush ;-) 2008-10-30 21:01 maze, it's a braindamaged interface imho, not least because it works at cross purposes with delayed allocation 2008-10-30 21:01 we don't actually need to allocate physical disk blocks (besides reserving them) until we actually wish to flush to disk 2008-10-30 21:01 ;-) 2008-10-30 21:01 here and quiet :) 2008-10-30 21:01 [not reserving them - reserving space for them 'somewhere'] 2008-10-30 21:01 I was able to make use of the minix_get_block to fill a page ;-) 2008-10-30 21:02 maze, yes, and we always are going to send something to disk when we get a ->writepage call, though it may not be the page we got the ->writepage for 2008-10-30 21:02 right 2008-10-30 21:02 ->writepage also comes to us from deep in vm 2008-10-30 21:02 in shrink_caches 2008-10-30 21:02 wait 2008-10-30 21:02 why will we always send something to disk? 2008-10-30 21:03 we're doing write-through? 2008-10-30 21:03 because either the user or vmm told us we should 2008-10-30 21:03 not write-cache and flush on close 2008-10-30 21:03 ? 2008-10-30 21:03 it's bad behavior for a fs to cache write stuff for a long time 2008-10-30 21:04 generic behavior is to start the IO transfer inside sys_write 2008-10-30 21:04 yes, it's writethrough 2008-10-30 21:04 uhm, I'd argue that 2008-10-30 21:04 if the disk is idle - sure start writeing something 2008-10-30 21:04 otherwise... 2008-10-30 21:05 ok, let's start next time by considering the question of where we do _not_ immediately initiate writeout in sys_write 2008-10-30 21:05 very useful exercise 2008-10-30 21:05 :-) 2008-10-30 21:06 how did we do today, what ground did we cover 2008-10-30 21:06 since there are huge benefits to clumping writes and reads up, we shouldn't write to aggressively 2008-10-30 21:06 ACTION this today's lesson was also short ;-) 2008-10-30 21:06 felt short, true 2008-10-30 21:06 it was an hour 2008-10-30 21:06 went by fast 2008-10-30 21:06 true :D 2008-10-30 21:08 I have to start implementing the write part for minix so today's lesson was informative for me 2008-10-30 21:09 also the area I'm working in at the moment, kind of 2008-10-30 21:09 minix? 2008-10-30 21:09 no, writeout 2008-10-30 21:09 for tux3 2008-10-30 21:10 ah, yes 2008-10-30 21:10 atomic commit, and just exactly what our cache behavior will be 2008-10-30 21:10 RalucaM, are you writing minix? 2008-10-30 21:10 flips, i see 2008-10-30 21:10 hirofumi: I'm trying to use the minix fs from macos ;-) 2008-10-30 21:10 nope 2008-10-30 21:11 i see 2008-10-30 21:11 razvanm, do you talk to the ext3cow guys? 2008-10-30 21:11 I've mostly been going through various primitives in the kernel and trying to familiarize myself with them 2008-10-30 21:11 (atomic ops, mutexes, etc...) 2008-10-30 21:11 flips: the guy graduated and left before I had a chance to benefit from his knowledge :( 2008-10-30 21:11 maze, if you try to do them all you will never do anything but ;) 2008-10-30 21:12 razvanm, it was just one guy? 2008-10-30 21:12 no, not all - just the ones I run into 2008-10-30 21:12 flips: I think so :D 2008-10-30 21:12 ext2cow still seems to be an active project 2008-10-30 21:12 and it still seems to be centered in JHU 2008-10-30 21:13 http://www.ext3cow.com/Developers.html 2008-10-30 21:13 zach left 2008-10-30 21:13 Randal is the prof 2008-10-30 21:13 http://www.ext3cow.com/Blog/Blog.html 2008-10-30 21:13 I hope he'll sign my project when I'm done :D 2008-10-30 21:13 last entry is june 2008-10-30 21:14 indeed 2008-10-30 21:14 hmm, last patch is 2.6.20.3 2008-10-30 21:15 does seem like some loss of momentum 2008-10-30 21:15 I should email randal burns and see if there are plans 2008-10-30 21:16 do you meet him? 2008-10-30 21:16 never 2008-10-30 21:17 not so far 2008-10-30 21:17 but maybe eventually 2008-10-30 21:17 http://hssl.cs.jhu.edu/pipermail/ext3cow-devel/2008-October/000064.html 2008-10-30 21:17 last post is today 2008-10-30 21:17 :D so the world is not yet _that_ small 2008-10-30 21:17 nearly that small 2008-10-30 21:19 how working for ise inc 2008-10-30 21:19 yup 2008-10-30 21:19 a bunch of people are there now 2008-10-30 21:22 looks like ext3cow is dead in the water without zachary 2008-10-30 21:23 http://hssl.cs.jhu.edu/pipermail/ext3cow-devel/2008-July/000048.html <- somebody from france forward ported to 2.6.25.3 2008-10-30 21:23 did they have something more to add to it? 2008-10-30 21:23 deletion? 2008-10-30 21:23 kernel merge? 2008-10-30 21:24 that is, snapshot deletion 2008-10-30 21:24 deletion inside a snapshot? 2008-10-30 21:24 seems to me, as it is it will run and make snapshots until the volume is full 2008-10-30 21:24 then game over 2008-10-30 21:24 I will be amazed if they don't have a way to delete a snapshot :D 2008-10-30 21:25 that's what the thread is about 2008-10-30 21:26 > > There are some limitations : 2008-10-30 21:26 > > -The oldest version of a file cannot be deleted with this method. 2008-10-30 21:26 > > -Old versions of directories cannot be deleted. 2008-10-30 21:26 > 2008-10-30 21:27 Nicolas ENG is doing it 2008-10-30 21:27 hmm... 2008-10-30 21:27 anyway I do not expect it is easy 2008-10-30 21:27 but it's not dead 2008-10-30 21:27 that's good 2008-10-30 21:27 I have a question 2008-10-30 21:28 will the economic downturn improve the amount of work in open source communities? 2008-10-30 21:28 or it will be the other way around 2008-10-30 21:30 probably won't change it much 2008-10-30 21:30 hasn't in the past 2008-10-30 21:30 what it does tend to do is accelerate adoption 2008-10-30 21:30 why is that? 2008-10-30 21:30 other factors are way more important 2008-10-30 21:30 like who happens to be inspired at the time 2008-10-30 21:30 and what kind of tools are used 2008-10-30 21:31 a lot of the work comes from universities 2008-10-30 21:31 which are sheltered from the economy pretty well 2008-10-30 21:31 are they? 2008-10-30 21:31 even for this one? :D 2008-10-30 21:32 as long as you can convince dad to keep sending money ;) 2008-10-30 21:32 brb 2008-10-30 21:33 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-30 21:33 back 2008-10-30 21:33 during economic booms, high tech companies tend to raid the universities 2008-10-30 21:33 call that cradle robbing 2008-10-30 21:34 that should cool down noticeably this year, in fact it already has 2008-10-30 21:34 lets students concentrate on a more well round education 2008-10-30 21:35 :-) 2008-10-30 21:35 enrollment in grad school will probably go up :) 2008-10-30 21:36 the undergrad enrollment was pretty low at JHU for some years 2008-10-30 21:37 pretty expensive isn't it? 2008-10-30 21:37 indeed... 2008-10-30 21:37 >30K 2008-10-30 21:38 yeah thats rediculous 2008-10-30 21:38 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-30 21:39 wow 37700 according to their site 2008-10-30 21:40 the univ could be hit hard if people will not be able to afford this anymore 2008-10-30 21:40 per year? 2008-10-30 21:40 yes! 2008-10-30 21:40 thats over 150k for a 4 yr undergrad degree 2008-10-30 21:40 plus books 2008-10-30 21:40 as a business proposition... marginal 2008-10-30 21:40 unless it comes with scholarships 2008-10-30 21:41 which why the grad is attractive ;-) 2008-10-30 21:41 not so attractive when to watch the advisors writing proposal after proposal to get the funding :| 2008-10-30 21:41 to = you 2008-10-30 21:42 so how do you manage if you don't mind saying in public? 2008-10-30 21:43 I do my part as best as I can while the advisors are doing the same 2008-10-30 21:44 I come from Ro where research is a luxury that schools doesn't have 2008-10-30 21:44 so this is heaven from that perspective :P 2008-10-30 21:45 considering the super low number of American students this is something that doesn't look the same for them 2008-10-30 21:46 I certainly appreciated the opportunity to get to university 2008-10-30 21:46 almost didn't leave ;) 2008-10-30 21:46 how is that? 2008-10-30 21:46 got bitten by the computer hacking bug 2008-10-30 21:46 had lots of primary research going on around me 2008-10-30 21:47 very compelling environment 2008-10-30 21:47 :D 2008-10-30 21:48 i ran away from school as fast as i could after my undergrad 2008-10-30 21:49 easy choice those were the dot com days 2008-10-30 21:50 contributory cause of the dot bust? 2008-10-30 21:51 acres of cubes full of dropouts learning by doing ;) 2008-10-30 21:51 sort of like this time round 2008-10-30 21:52 hah, actually i finished school in the middle of the dot bomb 2008-10-30 21:53 theres always plenty of jobs right after the 'oh shit we fired too many people' stage 2008-10-30 21:55 there's always plenty of jobs for anybody who can admin/hack their way out of a wet paper bag 2008-10-30 21:56 yeah that too 2008-10-30 22:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-30 23:12 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-31 04:01 folks 2008-10-31 08:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-31 10:09 -!- mingming(~mingming@c-24-22-117-202.hsd1.or.comcast.net) has joined #tux3 2008-10-31 10:12 FYI: prepare_write and commit_write was replaced completely in current tree 2008-10-31 11:06 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-10-31 11:09 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-31 11:57 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-10-31 11:57 lol 2008-10-31 11:58 moving target ;-) 2008-10-31 11:58 replaced with what? 2008-10-31 11:58 ->write_begin and ->write_end 2008-10-31 12:00 it should be small issue 2008-10-31 12:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-31 14:13 folks 2008-10-31 16:47 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-10-31 21:47 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-10-31 22:41 -!- bobby(~chatzilla@122.252.226.161) has joined #tux3 2008-10-31 23:50 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 02:07 happy halloween 2008-11-01 04:12 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-01 09:30 -!- pgquiles(~pgquiles@158.Red-80-39-234.dynamicIP.rima-tde.net) has joined #tux3 2008-11-01 12:44 -!- konrad(~konrad@D-128-208-53-208.dhcp4.washington.edu) has joined #tux3 2008-11-01 12:44 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-01 12:44 -!- pgquiles(~pgquiles@158.Red-80-39-234.dynamicIP.rima-tde.net) has joined #tux3 2008-11-01 12:44 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-11-01 12:44 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-11-01 12:44 -!- hirofumi(~hirofumi@210.171.168.39) has joined #tux3 2008-11-01 12:44 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-11-01 12:44 -!- mlankhorst(~m@fw1.astro.rug.nl) has joined #tux3 2008-11-01 12:44 -!- vcgomes[away](~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-11-01 12:44 -!- flips(~phillips@phunq.net) has joined #tux3 2008-11-01 12:44 -!- bushman(~marcin@c-76-23-106-132.hsd1.sc.comcast.net) has joined #tux3 2008-11-01 12:51 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-01 13:21 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-11-01 13:58 -!- ajonat(~ajonat@190.48.108.125) has joined #tux3 2008-11-01 15:01 -!- konrad(~konrad@sfr.cs.washington.edu) has joined #tux3 2008-11-01 15:22 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 17:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 18:15 -!- ajonat(~ajonat@110-74-17-190.fibertel.com.ar) has joined #tux3 2008-11-01 18:25 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 19:28 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 22:24 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-01 23:01 -!- Kirantpatil(~kiran@122.167.209.82) has joined #tux3 2008-11-01 23:01 -!- Kirantpatil(~kiran@122.167.209.82) has left #tux3 2008-11-02 08:27 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 09:00 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 09:56 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 11:33 -!- pgquiles(~pgquiles@228.Red-81-35-100.dynamicIP.rima-tde.net) has joined #tux3 2008-11-02 11:38 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 11:49 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-11-02 13:10 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-02 14:03 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 16:13 -!- ajonat(~ajonat@190.48.108.125) has joined #tux3 2008-11-02 18:30 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 19:12 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-02 20:05 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-11-02 20:42 folks 2008-11-02 22:37 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-03 00:20 -!- pranith(ca4bcee2@webchat.mibbit.com) has joined #tux3 2008-11-03 00:20 hey a;; 2008-11-03 00:20 all* 2008-11-03 01:02 flips: http://lkml.org/lkml/2007/7/28/186 2008-11-03 01:02 old post from me, not sure if you ever saw this 2008-11-03 07:25 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-03 08:15 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-03 08:22 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-11-03 08:53 -!- pgquiles(~pgquiles@228.Red-81-35-100.dynamicIP.rima-tde.net) has joined #tux3 2008-11-03 10:16 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-11-03 10:58 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-11-03 12:13 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-03 12:17 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-11-03 13:14 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-11-03 16:18 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-03 16:50 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-11-03 18:38 -!- ajonat(~ajonat@190.48.112.111) has joined #tux3 2008-11-03 19:48 -!- RazvanM(~RazvanM@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-11-03 21:33 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-03 23:44 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-04 06:15 -!- mlankhorst_(~m@fw1.astro.rug.nl) has joined #tux3 2008-11-04 06:49 -!- pgquiles(~pgquiles@228.Red-81-35-100.dynamicIP.rima-tde.net) has joined #tux3 2008-11-04 08:46 -!- mingming(~mingming@32.97.110.51) has joined #tux3 2008-11-04 09:09 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-11-04 09:12 ACTION is going to be away for the rest of the week (due to a conference) 2008-11-04 10:14 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-11-04 12:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-04 14:31 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-11-04 19:56 -!- RalucaM(~ral@pool-151-196-126-202.balt.east.verizon.net) has joined #tux3 2008-11-04 19:57 hi 2008-11-04 19:57 yo 2008-11-04 20:08 hi 2008-11-04 20:08 now that the election thing's done, should we probe the kernel? 2008-11-04 20:08 :-) 2008-11-04 20:08 oh is it? 2008-11-04 20:08 just 2008-11-04 20:09 some whooping and hollering outside 2008-11-04 20:09 nice and quiet over here 2008-11-04 20:10 looks like a huge win too 2008-11-04 20:10 projections as of yesterday ran from 330 to 350 2008-11-04 20:11 looks like it will be towards the high side 2008-11-04 20:12 http://lxr.linux.no/linux+v2.6.27/ 2008-11-04 20:12 let's take a look at iget 2008-11-04 20:13 if you're ready 2008-11-04 20:13 hi 2008-11-04 20:14 hi 2008-11-04 20:14 today is tux3 u? 2008-11-04 20:14 just starting 2008-11-04 20:14 now that the u.s. election