2008-09-11 00:30 inode table block 0x0/15 (f2c bytes free) 2008-09-11 00:30 0x0: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root 4:1 ctime 0 size 200 2008-09-11 00:30 0x2: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root 6:1 2008-09-11 00:30 0xa: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0000000 uid 0 gid 0 root a:1 2008-09-11 00:30 0xd: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0040755 uid 0 gid 0 root 8:1 2008-09-11 00:30 0xe: new_xcache: realloc xcache to 9999 2008-09-11 00:30 mode 0100700 uid 0 gid 0 root d:1 ctime 0 size 1008 xattr(s) 2008-09-11 00:30 {1} => 0x805f110: 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 "hello world!" 2008-09-11 00:30 inode 0xe (14) has an extended attribute with atom number 1 and body "hello world!" 2008-09-11 00:31 so an xattr made it into the inode table 2008-09-11 00:31 and onto disk I think 2008-09-11 00:31 need to verify that by trying to get it back 2008-09-11 01:29 nice 2008-09-11 02:16 one step closer to ruling the world 2008-09-11 02:18 flips: what you were trying to do was add reference counting to attributes ? 2008-09-11 02:18 but aborted it ? 2008-09-11 02:18 aborted? 2008-09-11 02:18 no reference counting just now 2008-09-11 02:18 just trying to get xattrs onto disk and back off 2008-09-11 02:18 very close to that now 2008-09-11 02:19 right there was some discussion about it and you decided to go with a simpler approach 2008-09-11 02:19 I decided to go with reference counting 2008-09-11 02:19 yeah, looks like it 2008-09-11 02:19 oh really ? 2008-09-11 02:19 but not just yet 2008-09-11 02:19 have you thought about having extensions for easy of use with samba ? 2008-09-11 02:19 they just want xattrs that work well 2008-09-11 02:20 ok 2008-09-11 02:20 tridge was disappointed with the performance of pretty well every filesystem wrt xattrs 2008-09-11 02:21 it's a hard problem to solve 2008-09-11 02:21 more folks just ignore it 2008-09-11 02:21 more=most 2008-09-11 02:21 generally done badly from what I've seen 2008-09-11 02:21 I like the way it's coming out in tux3 2008-09-11 02:21 store_attrs: Failed assertion "attr == base + size"! 2008-09-11 02:21 Trace/breakpoint trap 2008-09-11 02:21 got to debug 2008-09-11 02:22 ok 2008-09-11 02:23 ah I see the problem 2008-09-11 02:24 the attribute size estimation done for xattrs before saving inode should not include the xattr header size, only the variable data part 2008-09-11 02:32 there, got my xattr back 2008-09-11 02:32 lets see if I can set a new one 2008-09-11 02:32 yay 2008-09-11 02:32 nope, that's the problem, the second set fails 2008-09-11 02:32 but it's progress 2008-09-11 02:33 very definite progress 2008-09-11 02:46 it works now 2008-09-11 02:46 konrad, you can say yay for real ;-) 2008-09-11 02:47 yay for real 2008-09-11 02:47 :-) 2008-09-11 02:47 bug was in a part of the code you worked on 2008-09-11 02:47 for (int kind = MIN_ATTR; kind < VAR_ATTRS; kind++) { 2008-09-11 02:47 but xattrs didn't exist then 2008-09-11 02:48 the attribute encode now has two parts 2008-09-11 02:48 the part that encodes 'standard' attributes 2008-09-11 02:48 and the part that enocdes extended attribute from the xcache 2008-09-11 02:48 ah 2008-09-11 02:49 the standard attribute encoder better not write out headers for extended attributes, which it was doing 2008-09-11 02:49 this part of the code is going to evolve a lot as things progress 2008-09-11 02:50 it gets more complex when versioning arrives 2008-09-11 02:50 then we can't just blindly overwrite the entire set of attributes in the verison table 2008-09-11 02:50 because the inode only has the attributes for one version 2008-09-11 02:50 attributes for other versions have to be left alone 2008-09-11 02:50 messy 2008-09-11 02:51 but also some weeks away 2008-09-11 02:51 this code will do for the nonversioning protoytpe 2008-09-11 02:55 committed 2008-09-11 02:55 enough for today 2008-09-11 02:58 I think I need to reward myself with a pair of these: http://www.skatehut.co.uk/acatalog/Seba_FR1_Skates_-_Orange_White___195.html 2008-09-11 02:59 £200 is a lot in USD 2008-09-11 03:01 can get them for $350 here 2008-09-11 03:01 I think 2008-09-11 03:01 not easy to get 2008-09-11 03:01 americans have dodgy taste in skates ;-) 2008-09-11 03:02 everybody is either fitness or agressive 2008-09-11 03:02 aggressive skates are just stupid 2008-09-11 03:02 made for only one thing: sliding down rails 2008-09-11 03:02 yeah 2008-09-11 03:02 tiny little wheels 2008-09-11 03:02 don't need wheels for that 2008-09-11 03:02 heh 2008-09-11 03:03 just wear a pair of shoes with no traction :) 2008-09-11 03:03 "extreme walking" 2008-09-11 03:03 right 2008-09-11 03:03 I saw a couple of aggro skaters for the first time on the strand 2008-09-11 03:03 jumping up on things, seemed like fun 2008-09-11 03:03 heh 2008-09-11 03:04 but I can do that on my street skates too 2008-09-11 03:04 really? 2008-09-11 03:04 kind of tough to slide down rails 2008-09-11 03:04 or impossible 2008-09-11 03:04 you have enough space between the middle wheels? 2008-09-11 03:04 yeah 2008-09-11 03:04 no I don't grind 2008-09-11 03:04 there's a grind plate, I can get up on some things with it 2008-09-11 03:04 yeah 2008-09-11 03:04 ah 2008-09-11 03:05 sounds like you have some experience 2008-09-11 03:05 no 2008-09-11 03:05 I just put on skates for the first time in 6-7 years a few days ago 2008-09-11 03:05 they're kind of a size or size and a half too small 2008-09-11 03:05 ouch 2008-09-11 03:05 yeah 2008-09-11 03:07 started skating down the little stub walls the skateboarders grind on 2008-09-11 03:07 that seems to impress the skateboarders 2008-09-11 03:07 it's easier to do it on one foot 2008-09-11 03:07 probably looks harder though 2008-09-11 03:17 heh 2008-09-11 03:52 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 04:14 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-09-11 04:51 -!- kmeyer(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-11 05:00 flips: tux3fuse has xattrs now (not my doing) 2008-09-11 05:30 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 05:30 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-11 05:30 -!- nataliep_(~nataliep@66-102-14-1.google.com) has joined #tux3 2008-09-11 05:30 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-09-11 05:30 -!- eli(~elicriffi@66.249.86.209) has joined #tux3 2008-09-11 05:30 -!- shapor(~shapor@yzf.shapor.com) has joined #tux3 2008-09-11 05:58 -!- konrad(~konrad@c-24-16-74-109.hsd1.mn.comcast.net) has joined #tux3 2008-09-11 05:58 -!- RzM|Away(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 05:58 -!- vcgomes(~vcgomes@li17-238.members.linode.com) has joined #tux3 2008-09-11 05:58 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-11 05:58 -!- bh(~billh@ip68-107-26-122.sd.sd.cox.net) has joined #tux3 2008-09-11 07:33 -!- kbingham(~kbingham@193.132.141.186) has joined #tux3 2008-09-11 07:48 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-09-11 09:02 -!- pgquiles(~pgquiles@253.Red-83-44-239.dynamicIP.rima-tde.net) has joined #tux3 2008-09-11 12:09 -!- MaZe(~MaZe@216-239-45-4.google.com) has joined #tux3 2008-09-11 13:45 -!- kbingham(~kbingham@92.20.246.248) has joined #tux3 2008-09-11 14:04 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:06 konrad, that was amazingly fast of tero hmm? Nice code too. 2008-09-11 14:06 konrad, but you made this all happen, and your code was very decent as well 2008-09-11 14:07 tero is in the real pro category, lots to learn from him 2008-09-11 14:12 -!- cdk(~chinmay@121.246.36.119) has left #tux3 2008-09-11 14:15 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:19 -!- cdk(~chinmay@121.246.36.119) has joined #tux3 2008-09-11 14:31 My new boombox arrive 2008-09-11 14:31 now I can go totally ghetto, skating down to the beach with a ghetto blaster in my hand 2008-09-11 14:32 ACTION is degenerating under the influence of certain skaters 2008-09-11 15:04 haircut time 2008-09-11 15:04 later... 2008-09-11 15:37 http://linux.slashdot.org/article.pl?sid=08/09/11/1913229 <- first time I ever say the "pigfuckers" tag on slashdot 2008-09-11 15:37 re lenova caving to msft on shipping linux preinstalls 2008-09-11 16:42 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 16:43 howdy 2008-09-11 18:06 sk8 oclock 2008-09-11 18:06 on this skate, I will think about implementation details of atom refcounts 2008-09-11 19:28 that was fun 2008-09-11 19:28 I did a move that got the sk8rs clapping 2008-09-11 19:28 then they said "ok rollerbladers are allowed" 2008-09-11 19:28 in the sk8 park that is 2008-09-11 19:29 got to grab a quick bite, then hopefully we can do chapter two of tux3 university 2008-09-11 19:29 anybody here for that? 2008-09-11 19:30 ACTION nods 2008-09-11 19:47 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 19:55 hiyah 2008-09-11 19:55 just warming up for the next episode 2008-09-11 19:55 with a some pasta and a glass of cabernet 2008-09-11 19:55 ran out of chianti ;-) 2008-09-11 19:56 :D 2008-09-11 19:56 btw, what is brand is that ramen you mention the other day? 2008-09-11 19:57 important point 2008-09-11 19:57 shin ramyun 2008-09-11 19:57 made by nog shim 2008-09-11 19:58 sorry 2008-09-11 19:58 nong shim 2008-09-11 19:58 "family pack" 2008-09-11 19:58 "gourmet spicy" 2008-09-11 19:58 overdid it a little yesterday, had three packs ;-) 2008-09-11 19:58 don't do that 2008-09-11 19:58 -!- RalucaME(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-09-11 19:58 http://kimchimamas.typepad.com/.shared/image.html?/photos/uncategorized/2007/12/13/nong_shim.jpg ? 2008-09-11 19:59 exactly 2008-09-11 20:00 I like it hot :-) 2008-09-11 20:00 seldom had better ramyun, even in korea 2008-09-11 20:00 it's actually korean of course 2008-09-11 20:00 got it from a japanese grocery 2008-09-11 20:01 -!- ebiederm(~eric@c-24-130-11-59.hsd1.ca.comcast.net) has joined #tux3 2008-09-11 20:01 hi eric 2008-09-11 20:01 so I'll have no chance to find it at Giant or Superfresh :( 2008-09-11 20:01 let me introduce you to one of the foremost kernel hackers in the known universe 2008-09-11 20:01 eric biederman 2008-09-11 20:01 say hi :-) 2008-09-11 20:01 hello all. 2008-09-11 20:02 eric is responsible for much of what makes linux great in the supercomputing cluster space 2008-09-11 20:02 ACTION also says Hello! 2008-09-11 20:02 konrad, don't be shy ;-) 2008-09-11 20:02 hi Eric 2008-09-11 20:02 hello :) 2008-09-11 20:02 well eric is not really a vfs guy, just a general genius 2008-09-11 20:03 knows everything about everything nearly 2008-09-11 20:03 :-) 2008-09-11 20:03 lol 2008-09-11 20:03 ACTION double checks that the logging is enabled 2008-09-11 20:03 hey 2008-09-11 20:03 also, nataliep_ up there is the linux kernel bug manager 2008-09-11 20:03 more folks have joined, nice 2008-09-11 20:03 ok, let's start 2008-09-11 20:04 first let me ask some questions: what does VFS stand for? 2008-09-11 20:04 virtual file system 2008-09-11 20:04 close but no 2008-09-11 20:04 subsystem? :D 2008-09-11 20:04 ACTION listens to the sound of googling 2008-09-11 20:04 hey 2008-09-11 20:04 maze! 2008-09-11 20:05 yeah, so 8pm is a little tight ;-) 2008-09-11 20:05 maze is about the smartest smart person I met a google 2008-09-11 20:05 hehe, thanks! 2008-09-11 20:05 no exaggeration 2008-09-11 20:05 ok, let's try again: what does VFS stand for? 2008-09-11 20:05 googlling is ok 2008-09-11 20:05 ACTION is diluting the quality of the channel :P 2008-09-11 20:06 I doubt that, razvanm 2008-09-11 20:06 virtual file system 2008-09-11 20:06 er wait 2008-09-11 20:06 switch? 2008-09-11 20:06 that's been said hasn't it 2008-09-11 20:06 right! 2008-09-11 20:06 see? 2008-09-11 20:06 razvanm wins 2008-09-11 20:06 it stands for virtual filesystem switch 2008-09-11 20:06 versioning file system :P 2008-09-11 20:06 firefox had 'AVFS' at the top of my url bar for vfs :( 2008-09-11 20:06 how it got that name, I don't know 2008-09-11 20:06 it was the first hit for 'vfs lnux' :P 2008-09-11 20:06 eric probably does 2008-09-11 20:07 lol 2008-09-11 20:07 it switches between the different filesystems like a network switch switches between computers 2008-09-11 20:07 somebody better find out, because it's sure to come up at a geek challenge context at linuxtag eventually 2008-09-11 20:07 yes 2008-09-11 20:07 it is a colletion of methods that together implement a filesystem 2008-09-11 20:07 find out what? 2008-09-11 20:08 how it came to be called that 2008-09-11 20:08 I know where it came from but not why they picked the name. When the implemented the second filesystem on BSD they needed an abstraction layer. 2008-09-11 20:08 the vfs.txt from Documentation says: Overview of the Linux Virtual File System 2008-09-11 20:08 who came up with it 2008-09-11 20:08 etc 2008-09-11 20:08 ah 2008-09-11 20:08 trivia ;-) 2008-09-11 20:08 I knew eric would win that somehow ;-) 2008-09-11 20:08 well let me tell you 2008-09-11 20:08 the foremost filesystem dev on bsd does not know what vfs means 2008-09-11 20:08 :D 2008-09-11 20:08 or who called it taht, or why 2008-09-11 20:08 yet he is definitely the foremost fs dev 2008-09-11 20:09 everybody know his name? 2008-09-11 20:09 quick... 2008-09-11 20:09 hint: 2008-09-11 20:09 I suck at trivia... I'm lucky to know my own name... 2008-09-11 20:09 McKusick? 2008-09-11 20:09 he engaged in a discussion re tux3 design recently 2008-09-11 20:09 mckusick is close but no 2008-09-11 20:10 hint: firefly 2008-09-11 20:10 Dillon? 2008-09-11 20:10 the dragonfly hammer guy? 2008-09-11 20:10 yes! 2008-09-11 20:10 Matt Dillon IIRC 2008-09-11 20:10 hammer? 2008-09-11 20:10 also responsible for linux having a reverse mapped vm 2008-09-11 20:10 used to be the bsd vm guy 2008-09-11 20:10 is now the vm fs guy 2008-09-11 20:10 and runs his own distro 2008-09-11 20:10 intensely clueful person 2008-09-11 20:10 ok 2008-09-11 20:10 let's do some vfs 2008-09-11 20:11 ACTION is ready 2008-09-11 20:11 and let's start from the opposite end that we started from yesterday 2008-09-11 20:11 everybody got their browsers ready? 2008-09-11 20:11 yesterday? 2008-09-11 20:11 eh 2008-09-11 20:11 day before yesterday 2008-09-11 20:11 last time ;-) 2008-09-11 20:11 :D 2008-09-11 20:11 loaded ;-) 2008-09-11 20:12 lxr.linux.no should be my homepage or something 2008-09-11 20:12 lets go here: http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c 2008-09-11 20:12 super.c is the "main" for a linux filesystem 2008-09-11 20:12 we might call it tux3.c for tux3, or we might go with tradition and call it super.c 2008-09-11 20:13 it's got module_{init,exit} 2008-09-11 20:13 it has two basic tasks: 1) parse the mount options 2) load the fs superblock 2008-09-11 20:13 right 2008-09-11 20:13 it takes care of a few other details besides 2008-09-11 20:13 so let's take a look at some really crappy parsing code 2008-09-11 20:14 parse_options 2008-09-11 20:14 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L428 2008-09-11 20:14 line 429 2008-09-11 20:14 oops :) 2008-09-11 20:14 depends on the version of course 2008-09-11 20:15 429 on mine as well 2008-09-11 20:15 nothing really interesting here 2008-09-11 20:15 just good to know where it is 2008-09-11 20:15 so, there isn't actually such a thing as a linux "mount" program 2008-09-11 20:15 so it gets a string and a pointer to the superblock? 2008-09-11 20:15 all we do is call the fs's mount entry point 2008-09-11 20:15 sbi 2008-09-11 20:16 not quite the same 2008-09-11 20:16 sbi is the filesystem-specific bit of a superblock 2008-09-11 20:16 so that's the in-mem representation of an ext2 superblock 2008-09-11 20:16 superblocks and inodes in linux are both generic structures 2008-09-11 20:16 almost 2008-09-11 20:16 re in-mem rep 2008-09-11 20:17 there is also an exact image of the disk superblock that ext2 keeps around 2008-09-11 20:17 I don't know if tux3 will bother 2008-09-11 20:17 we shall see, that is a fiddly detain 2008-09-11 20:17 the sbi corresponds to what is called struct sb in the tux3 userspace 2008-09-11 20:18 and tux3 doesn't really have a generic superblock implemented at the moment 2008-09-11 20:18 linux kernel does 2008-09-11 20:18 superblock fields are separated into two classes: 1) ones that core vfs knows what to do with 2) ones that only mean something to the fielsystem 2008-09-11 20:18 inodes are separated the same way 2008-09-11 20:19 by a completely different mechanism, for not good reason 2008-09-11 20:19 any idea what the 0pt_ in the tokens means? 2008-09-11 20:19 the superblock specialization is via a fs-specific pointer 2008-09-11 20:19 oh, its opt not 0pt ;-) 2008-09-11 20:19 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L395 2008-09-11 20:20 not really 2008-09-11 20:20 5 minutes of poking will answer that 2008-09-11 20:20 or 1 minute 2008-09-11 20:20 there is some fairly trivial macro magic going on here and there 2008-09-11 20:20 [I mis-parsed as zero-pt font size...] 2008-09-11 20:20 anyway 2008-09-11 20:21 like I said, awful parsing code 2008-09-11 20:21 used to be a lot worse 2008-09-11 20:21 gets the job done in way too many lines 2008-09-11 20:21 well lets look at a more interesting bit 2008-09-11 20:21 loading the superblock 2008-09-11 20:21 quite tricky 2008-09-11 20:21 because the filesystem isn't working yet 2008-09-11 20:21 we don't even know the blocksize 2008-09-11 20:22 we have ext2_get_sb 2008-09-11 20:22 which is stored in the ext2_fs_type structure 2008-09-11 20:23 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L366 2008-09-11 20:23 of type "file_system_type" 2008-09-11 20:23 this is the starting point for any filesystem 2008-09-11 20:23 the tip of the iceberg 2008-09-11 20:23 root of the tree 2008-09-11 20:23 heart of the dragon etc 2008-09-11 20:24 file_system_type defines a few methods, by far the most important of which is get_sb 2008-09-11 20:24 this structure is passed to register_filesystem 2008-09-11 20:24 when the module is initialized 2008-09-11 20:24 which happens these days whether or not is actually a module by the way 2008-09-11 20:25 and that makes the filesystem appear in /proc/filesystems 2008-09-11 20:25 so everybody should do cat /proc/filesystems now 2008-09-11 20:25 and tell what they see there that is really interesting 2008-09-11 20:26 lots of nodev's 2008-09-11 20:26 lots of internal no-blockdev fs'es and 4 dev-fs'es 2008-09-11 20:26 suggesting that nodiv is a stupid idea... 2008-09-11 20:26 which is true 2008-09-11 20:26 and? 2008-09-11 20:26 my oly non-nodev are ext3 and vfat :P 2008-09-11 20:26 well, there's ext3, hfsplus, iso9660, fuseblk 2008-09-11 20:26 right 2008-09-11 20:26 and there is no tux3 2008-09-11 20:26 and a ton of internal ones (usb, ramfs, etc...) 2008-09-11 20:26 :D 2008-09-11 20:26 that is the most important thing to notice 2008-09-11 20:27 and that is why there is a tux3 university 2008-09-11 20:27 notice also that there is a ramfs 2008-09-11 20:27 ramfs is the second most useful filesystem for learning about the vfs 2008-09-11 20:27 the most useful being ext2 2008-09-11 20:27 also sockfs 2008-09-11 20:27 is sockfs for unix domain sockets? 2008-09-11 20:27 suckfs 2008-09-11 20:27 right 2008-09-11 20:28 :-) 2008-09-11 20:28 I'd prefer a shoefs 2008-09-11 20:28 don't take anything from the net side of linux as an example of anything besides "fast" 2008-09-11 20:28 sk8fs 2008-09-11 20:28 yup 2008-09-11 20:28 I see "fuse" 2008-09-11 20:28 interesting 2008-09-11 20:28 in fact, 3 or them 2008-09-11 20:28 fuse ,fuseblk, and fusectl 2008-09-11 20:28 3 of them 2008-09-11 20:29 that's a little over the top 2008-09-11 20:29 oh, this one is always a laugh: hugetlbfs 2008-09-11 20:29 a naive person would think one would be enough 2008-09-11 20:29 or would already be one too many 2008-09-11 20:29 hugetlbfs is indded the worst fs ever conceived 2008-09-11 20:29 what's the difference between rootfs/ramfs/tmpfs ? 2008-09-11 20:29 sometimes even the great penguin has bad days 2008-09-11 20:29 rootfs exists just to get linux booted 2008-09-11 20:30 probably a bad idea 2008-09-11 20:30 but that's how it works 2008-09-11 20:30 ramfs is really interesting 2008-09-11 20:30 it is basically just the vfs cache layer of a fs with all backing store stripped away 2008-09-11 20:30 it is worth reading every line 2008-09-11 20:30 is the split merely to be able to shave off more code in embedded? 2008-09-11 20:31 it is split for tutorial reasons 2008-09-11 20:31 ;-) 2008-09-11 20:31 ramfs is to serve as an example of a minimal fs with no backing store 2008-09-11 20:31 somehow it bloated up to 589 lines though 2008-09-11 20:31 when it really only needs 150 maybe 2008-09-11 20:32 so I guess somebody didn't get the memo ;-) 2008-09-11 20:32 tmpfs is the real workhorse 2008-09-11 20:32 that is basically ramfs backed by the swap device 2008-09-11 20:32 $ wc -l file-mmu.c 2008-09-11 20:32 53 file-mmu.c 2008-09-11 20:32 common mounted on /tmp these days 2008-09-11 20:32 commonly 2008-09-11 20:32 ok, I'll take a short break 2008-09-11 20:33 to refill my cabernet 2008-09-11 20:33 so tmpfs can be swapped out, while ramfs and rootfs can't 2008-09-11 20:33 linus pronounces 'vfs' as 'virtual filesystems' in ramfs/inode.c 2008-09-11 20:33 and why don't you compare notes? 2008-09-11 20:33 linus doesn't always get it right ;-) 2008-09-11 20:33 tytso would normally clobber him in a geek trivial contest 2008-09-11 20:34 http://farm1.static.flickr.com/164/413387043_ab2c7569a4.jpg :P 2008-09-11 20:34 :-) 2008-09-11 20:35 the reflection isn't quite as nice here 2008-09-11 20:35 but it does reflect, in this idea desk 2008-09-11 20:35 ikea 2008-09-11 20:35 ACTION also sits at an ikea desk ;-) 2008-09-11 20:36 ok, let's go up to ext2_fill_super 2008-09-11 20:36 we pass that as a method to a vfs library call 2008-09-11 20:36 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L737 2008-09-11 20:36 if you think that is an odd way to init a fs you'd be right ;-) 2008-09-11 20:36 so what is an sbi? 2008-09-11 20:37 ACTION waits 2008-09-11 20:37 sb info 2008-09-11 20:37 ext2_sb_info ptr 2008-09-11 20:37 right, and what points at it? 2008-09-11 20:37 sb->s_fs_info 2008-09-11 20:38 right 2008-09-11 20:38 so that is how the linux fs specializes a superblock 2008-09-11 20:38 by haing s_fs_info point at something allocated and initialized by the fs 2008-09-11 20:38 that only the fs will ever use 2008-09-11 20:38 how does it know how big to make it? 2008-09-11 20:39 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L768 <- here we read the superblock 2008-09-11 20:39 MaZe: sizeof(*sbi) 2008-09-11 20:39 maze, the fs declares it, and it makes it sizeof(that) 2008-09-11 20:39 won't that be fs dependant though? 2008-09-11 20:39 it is 2008-09-11 20:39 that is why it is a fs-specific pointer field 2008-09-11 20:39 pointer is always the same size :) 2008-09-11 20:40 core vfs will never look there 2008-09-11 20:40 right 2008-09-11 20:40 oh, right it's allocated within ext2 code 2008-09-11 20:40 thank goodness for that small mercy 2008-09-11 20:40 right 2008-09-11 20:40 here http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L755 2008-09-11 20:40 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L144 2008-09-11 20:40 one can easily imagine a universe in which pointers on the same machine are not all the same size 2008-09-11 20:41 keep'em beasties far away from me... 2008-09-11 20:41 so there is a some braindamage about trying to use the "blocksize as the device" to load the superblock 2008-09-11 20:41 bad idea 2008-09-11 20:41 should just assume that it is always the same size 2008-09-11 20:41 there is no legitimate concept of blocksize on a device, actually 2008-09-11 20:42 never mind that I have coded one in my vfs emulation ;-) 2008-09-11 20:42 that is a wart I will get rid of probably one day when it irritates me enough 2008-09-11 20:42 only the fs sbi should know the blocksize of the filesystem 2008-09-11 20:43 so, that nonsense about device blocksize is so that ext2 can use "sb_bread" to read the superblock 2008-09-11 20:43 again, there is no reason for this 2008-09-11 20:43 the tux3 userspace code directlly case "diskIo" there 2008-09-11 20:43 bypassing the buffer emulation 2008-09-11 20:43 and ext2 really should do the same, not have that fragile blocksize code there 2008-09-11 20:44 not get_sb_bdev ? 2008-09-11 20:44 right 2008-09-11 20:44 equivalent of tux3 diskio 2008-09-11 20:44 well 2008-09-11 20:44 these fns have a lot of cruft attached 2008-09-11 20:44 been through many iterations of doing things the wrong way 2008-09-11 20:45 so you want to go to the lowest level thing that will actually read if you want to be clear and robust here 2008-09-11 20:45 I'd be tempted to submit a bio 2008-09-11 20:45 but anyway 2008-09-11 20:45 we'll get there soon enough, and have to implement our own version of that 2008-09-11 20:45 let's do it a little more cleanly, but we don't have to save the world 2008-09-11 20:45 just now 2008-09-11 20:46 873 /* If the blocksize doesn't match, re-read the thing.. */ <- excellent example of yunk 2008-09-11 20:46 huck 2008-09-11 20:46 yuck 2008-09-11 20:46 :-) 2008-09-11 20:46 "yunk" is short for "yucky junk" 2008-09-11 20:46 and "huck" is what we will do with that in tux3 2008-09-11 20:47 so by here ext2 has managed to read its superblock: http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L898 2008-09-11 20:47 should actually have only been 3 lines, though we did do some options processing as well 2008-09-11 20:48 most of that is historical cruft 2008-09-11 20:48 keep in mind that ext2 is one of the cleanest filesystems ;-) 2008-09-11 20:48 :D 2008-09-11 20:48 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L915 <- ext2 dutifully reads the frag size, even though this bsd ufs concept was never implemented and never will be 2008-09-11 20:49 http://lxr.linux.no/linux+v2.6.26.5/fs/ext2/super.c#L941 <- it checks the super magic 2008-09-11 20:49 tux3 gets to this point about 20 lines in or so 2008-09-11 20:50 a few more than that actually 2008-09-11 20:50 tux3.c 2008-09-11 20:50 but in the kernel implementation, it will be about a dozen lines from the fill_super entry 2008-09-11 20:50 as it should be 2008-09-11 20:51 next big job is to read the root directory! 2008-09-11 20:51 this is exciting because the filesystem isn't working yet 2008-09-11 20:51 wouldn't it be enough to just get the rootdir's inode number? 2008-09-11 20:51 we need to get the root dir up and running as an inode 2008-09-11 20:51 so that that open (2) and readdir work on it 2008-09-11 20:52 so yes 2008-09-11 20:52 we need to know the rootdirs inode number 2008-09-11 20:52 that has evolved over time with ext2 2008-09-11 20:52 it used to just be a fixed number 2008-09-11 20:52 now there is a fancier method 2008-09-11 20:52 for no good reason 2008-09-11 20:53 Tux3 uses inode number 0xd (for "directory" or "daniel") for the root dir 2008-09-11 20:53 http://lxr.linux.no/linux+v2.6.26.5/include/linux/ext2_fs.h#L61 2008-09-11 20:53 right 2008-09-11 20:53 somewhere there is "good ol'" something 2008-09-11 20:54 first non-reserved is 11 2008-09-11 20:54 I might have conflated that with something else 2008-09-11 20:54 #define EXT2_GOOD_OLD_FIRST_INO 11 2008-09-11 20:54 well it doesn't matter except for geek quizes 2008-09-11 20:54 yah 2008-09-11 20:54 that's it 2008-09-11 20:54 ok, we have 6 minutes for questions 2008-09-11 20:54 going to stop right here, just before doing anything interesting ;-) 2008-09-11 20:55 exactly! :O 2008-09-11 20:55 ouch 2008-09-11 20:55 when's the next meeting? next tuesday at 8pm? 2008-09-11 20:55 this lesson was definitely shorter... 2008-09-11 20:55 well it was fun looking at all that busy looking code that doesn't actually do much, no? 2008-09-11 20:55 it seemed too little time 2008-09-11 20:55 how about tomorrow? :D 2008-09-11 20:55 next tuesday, yes 2008-09-11 20:55 yeah, tomorrow works 2008-09-11 20:55 tuesday, then? 2008-09-11 20:56 homework is: know how the root dir is loaded and initialized, and now that differs from how any other inode is opened 2008-09-11 20:56 and how 2008-09-11 20:56 I meant 2008-09-11 20:56 tomorrow is friday ;-) 2008-09-11 20:57 so what's the 'desired' way to read data off disk in a fs? submit bio-s? would that also be the best way to read the superblock (you seem to have suggested that) 2008-09-11 20:57 friday is my most productive day :P 2008-09-11 20:57 not only do I have to relax then, I have to get atom refcounting working 2008-09-11 20:57 maze, I like submit_bio, yes 2008-09-11 20:57 then you have to wait on some lock 2008-09-11 20:57 is that the lowest level interface to the block device layer? 2008-09-11 20:57 two or three lines 2008-09-11 20:57 it is 2008-09-11 20:57 the lowest one you can use without getting shouted at 2008-09-11 20:57 does it support priorities? 2008-09-11 20:57 depends on the elevator 2008-09-11 20:58 mostly linux elevators are pretty crappy 2008-09-11 20:58 no good rt elevator for example 2008-09-11 20:58 if that's what you're asking 2008-09-11 20:58 yeah, something like that 2008-09-11 20:58 feel free to write a noncrappy one 2008-09-11 20:58 you're the man to do it 2008-09-11 20:59 if I'm operating on behalf of a user, and he's running at some prio, or asking for some priority on his read/write/file op, than I'd like to be able to pass that down to the blockdev layer 2008-09-11 20:59 yes, and save us from that broken pos that is the current io scheduler 2008-09-11 20:59 I mean I obviously shouldn't be dealing with that in the fs, except for making sure I submit requests with the right priorities 2008-09-11 20:59 no, not in the fs 2008-09-11 20:59 though one can imagine the fs making suggestions 2008-09-11 21:00 and a realtime fs most certainly has to interact with the io scheduler 2008-09-11 21:00 (submit_bio is used only by xfs, ocfs2, jfs, gfs2 and ext4) 2008-09-11 21:00 the fs also has to answer the question "can I submit this request at all, and meet the constraints" 2008-09-11 21:00 what about networking? how would you go about sending/receiving udp? tcp? raw frames? other protocol? 2008-09-11 21:00 (what do the others use?) 2008-09-11 21:01 only the fs can know certainly crucial information about those constrainnts 2008-09-11 21:01 networking? 2008-09-11 21:01 sorry, missed the connection 2008-09-11 21:01 you mean realtime? 2008-09-11 21:01 [have to be careful - low prio process fetches a directory, higher priority process than needs to fetch it again - needs to result in increasing the bio priority or resubmitting it or something] 2008-09-11 21:01 razvanm, notice that submit_bio is used in all _modern_ fs's 2008-09-11 21:01 networking connection - I'm imagining a disk and network based multi-node fs 2008-09-11 21:02 I'm imaginative ;-) 2008-09-11 21:02 gfs2 only loosely meeting that definition 2008-09-11 21:02 flips: right :D 2008-09-11 21:02 maze,you're already IO fixing priority inversion? 2008-09-11 21:03 ah 2008-09-11 21:03 right 2008-09-11 21:03 no, just pointing out you have to be careful 2008-09-11 21:03 that kind of networking 2008-09-11 21:03 you do 2008-09-11 21:03 and as a rule we are not 2008-09-11 21:03 far from it 2008-09-11 21:03 tcp/ip is not realtime 2008-09-11 21:03 however 2008-09-11 21:03 so there's a lot of things I'd like to work on if I had the time ;-) 2008-09-11 21:03 you can kinda sorta pretend it is, sometimes 2008-09-11 21:03 networking is real-time if you have caching done correctly ;-) 2008-09-11 21:03 right 2008-09-11 21:04 really? 2008-09-11 21:04 you will have to convince me of that 2008-09-11 21:04 I think that random backout already makes it not realtime 2008-09-11 21:04 CSMACD 2008-09-11 21:04 or something like that 2008-09-11 21:04 oh, ok, I don't mean RT as in rtlinux rt 2008-09-11 21:04 carrier sense multiple access collision detect 2008-09-11 21:04 I meant usable on a desktop 2008-09-11 21:04 ah 2008-09-11 21:04 I always mean actual rt when somebody says rt 2008-09-11 21:05 flips: if the data is represented identified by some hashes over them then it could be ;-) 2008-09-11 21:05 I meant usable and not get killed by background tasks 2008-09-11 21:05 there linus and I differ 2008-09-11 21:05 (for reading) 2008-09-11 21:05 uhm, I never mentioned rt ;-) 2008-09-11 21:05 razvanm, what could be? 2008-09-11 21:05 maze, ok 2008-09-11 21:05 sorry 2008-09-11 21:05 try again? 2008-09-11 21:05 too many threads :D 2008-09-11 21:05 yup 2008-09-11 21:05 while rt is nice of course, and you should design with making it possible in the future of course 2008-09-11 21:05 the phillips switch is overloading 2008-09-11 21:06 I just wanted fg tasks to be able to run at higher priority than bg tasks (a garbage collector or bg file scan or ...) 2008-09-11 21:06 ;-) 2008-09-11 21:06 ok, well a single node filesystem has no business knowing anything about networking 2008-09-11 21:06 right 2008-09-11 21:06 yes, you have control over that 2008-09-11 21:06 complete control 2008-09-11 21:06 you are root 2008-09-11 21:06 beyond root 2008-09-11 21:06 we've already determined that a fs has to provide some interfaces to the vfs layer, and it interfaces with the blockdev layer via bio's 2008-09-11 21:07 there's only one limitation to what a filesystem in linux can do: use symbols that are not exported to modules, when it is compiled as a module 2008-09-11 21:07 add in some atomics/locks/primitives already provided by the kernel and mem management, and you have all pieces ;-) 2008-09-11 21:07 yes 2008-09-11 21:07 watch out for layer violations 2008-09-11 21:07 but in general, go crazy 2008-09-11 21:08 so basically, now the question was: how to implement nfs - what would the interface not to blockdev, but to network, be? 2008-09-11 21:08 there is not much to do 2008-09-11 21:08 nfs basically runs on top of a filesystem that doesn't even have to know its there 2008-09-11 21:08 there are a few small, weird hooks 2008-09-11 21:08 uhm? 2008-09-11 21:08 the details of which I forget 2008-09-11 21:08 nfs stacks on top of a host fs 2008-09-11 21:09 the host fs doesn't have to know it's being stacked on 2008-09-11 21:09 it just have to behave itself 2008-09-11 21:09 like a unix fs 2008-09-11 21:09 what do you mean by host fs? oh you mean for the nfs server? 2008-09-11 21:09 that's actually pretty hard ;p) 2008-09-11 21:09 I was thinking about the nfs client 2008-09-11 21:09 right 2008-09-11 21:09 ah, nfs client 2008-09-11 21:09 strange exception to pretty much everything 2008-09-11 21:09 it stacks on top of a remote host fs 2008-09-11 21:10 with all the oddities that implies 2008-09-11 21:10 including mid-flight reboots 2008-09-11 21:10 indeed 2008-09-11 21:10 there are papers written about how much this sucks 2008-09-11 21:10 let me see 2008-09-11 21:10 http://www.cc.gatech.edu/classes/AY2007/cs4210_fall/papers/nfsOLS.pdf 2008-09-11 21:10 the reboot? yeah, that's terrible, but it can be done in a way that it would work 2008-09-11 21:10 marginally 2008-09-11 21:10 you'd detect remote server reboot and have to dump caches, etc... 2008-09-11 21:11 I've been living/breathing that for the last 3 years 2008-09-11 21:11 I know ;-) 2008-09-11 21:11 yes, but we don't 2008-09-11 21:11 it's pathetic 2008-09-11 21:11 nobody pays attention to statd 2008-09-11 21:11 except lockd 2008-09-11 21:11 no excuse 2008-09-11 21:11 oh, I'm not thinking about NFS, I hate NFS, I'm thinking about a networkfs 2008-09-11 21:11 sun braindamage 2008-09-11 21:11 and linux too, because we should have fixed it by now 2008-09-11 21:12 oh a real networkfs 2008-09-11 21:12 just trying to figure out what the layering is there vfs / networkfs (missing this interface layer) networking 2008-09-11 21:12 well, lustre is getting close 2008-09-11 21:12 oscfs2 also 2008-09-11 21:12 I'm sure you will crack that one 2008-09-11 21:12 will be fun to watch your progress 2008-09-11 21:12 in the meantime, goals with tux3 are modest 2008-09-11 21:12 I need more than 24 hours in a day 2008-09-11 21:13 that is: support nfs no worse than any other filesystem 2008-09-11 21:13 hopefully much better 2008-09-11 21:13 hehe 2008-09-11 21:13 ebiederm, thanks for visiting 2008-09-11 21:14 I hope we did not disappoint ;-) 2008-09-11 21:14 an OT question: why hg and not git? 2008-09-11 21:14 ok, it is back to the question of atom refcounting 2008-09-11 21:14 you been following the thread, maze? 2008-09-11 21:15 sorry, which thread? 2008-09-11 21:15 razvanm, hg is a lot more usable than git 2008-09-11 21:15 about mercurial? 2008-09-11 21:15 instand on 2008-09-11 21:15 maze, no, about xattr atoms 2008-09-11 21:15 ah, no. 2008-09-11 21:15 on the tux3 list 2008-09-11 21:15 should I? 2008-09-11 21:15 please 2008-09-11 21:15 you subscribed? 2008-09-11 21:16 glancing ;-) 2008-09-11 21:16 I think I subscribed you 2008-09-11 21:16 more xattr design details? 2008-09-11 21:16 right, and associated posts 2008-09-11 21:16 the parent of that is the root of that tree 2008-09-11 21:17 uhm, gmail doesn't do trees ;-) 2008-09-11 21:17 they should fix that 2008-09-11 21:17 :p 2008-09-11 21:17 it's only beta 2008-09-11 21:17 right, it's also slow... 2008-09-11 21:17 let me see 2008-09-11 21:17 I know, I run exim4 here and it's beyond fast 2008-09-11 21:17 it's scary 2008-09-11 21:18 so I'm a big fan of atoms, because the space saving can be extreme 2008-09-11 21:18 [Tux3] The long and short of extended attributes 2008-09-11 21:18 ah, I like the sound of that 2008-09-11 21:18 you probably want to support even more atoms for selinux... but then the code gets complex 2008-09-11 21:18 I've been doing a lot of introspecting about it 2008-09-11 21:19 so you have the easy solution - use no atoms 2008-09-11 21:19 always on the verge of mass deleting that code 2008-09-11 21:19 I know, but I also feel its lame 2008-09-11 21:19 and just store rep { string=string } 2008-09-11 21:19 no null's thanks ;-) 2008-09-11 21:19 ext3 is 8 bit clean 2008-09-11 21:19 but otherwise yes 2008-09-11 21:19 (mind you I'd actually store that in reversed order, at the front of the file, going backwards towards negative offsets) 2008-09-11 21:20 reccount, namecount, , 2008-09-11 21:20 have it stored the same way as the rest of the file data 2008-09-11 21:20 ? 2008-09-11 21:20 xattr1=value1 xattr2=value2 filecontent="hello" ==> 2008-09-11 21:21 sorry, I meant tux3 is 8 bit clean 2008-09-11 21:21 where are the negative offsets? 2008-09-11 21:21 oh I see 2008-09-11 21:21 2eulav=2rttax 1eulav=1rttax hello 2008-09-11 21:21 | offset 0 at [H] in hello 2008-09-11 21:21 demented ;-) 2008-09-11 21:21 interesting idea 2008-09-11 21:21 it means you don't have to implement it though ;-) 2008-09-11 21:22 well the page cache doesn't have negative offsets 2008-09-11 21:22 you'd have to store at the top of the index range 2008-09-11 21:22 that's a good idea 2008-09-11 21:22 it should work out fine 2008-09-11 21:22 means you can't quite have a 16 TB file on 32 bit linux though 2008-09-11 21:23 16 TB less the maximum size of attributes 2008-09-11 21:23 no, you shave it down by however many xattrs you have 2008-09-11 21:23 so maybe a few kilobytes - in the future maybe more... who knows 2008-09-11 21:23 ok, that's twisted enough for me 2008-09-11 21:23 in what sense is it twisted? 2008-09-11 21:23 works perfectly on 64 bit linux... probably find a couple of radix tree bugs 2008-09-11 21:24 eeking out a small simplification by using the other end of the address range 2008-09-11 21:24 twisted 2008-09-11 21:24 I like it 2008-09-11 21:24 right you have to be signedness clean, or you can offset everything by a zero offset constant 2008-09-11 21:24 right 2008-09-11 21:24 like I way 2008-09-11 21:24 probably turn up a couple core linux bugs there 2008-09-11 21:24 but worth doing just for that reason 2008-09-11 21:24 or you can even just store it like this 0:hello empty space for expansion reverse xattrs :-1 2008-09-11 21:25 since you have to support holes anyway... 2008-09-11 21:25 sure 2008-09-11 21:25 it allows us to treat xattrs more like file data in kernel 2008-09-11 21:25 that's a tux3 meme 2008-09-11 21:25 exactly 2008-09-11 21:25 so I like it 2008-09-11 21:26 it means xattr support in the fs on-disk image is basically free 2008-09-11 21:26 for now we have the "xcache" 2008-09-11 21:26 which is even faster to access than a page cache mapping page 2008-09-11 21:26 well 2008-09-11 21:26 hmm 2008-09-11 21:26 is it? 2008-09-11 21:26 somewhat 2008-09-11 21:26 I think it's mostly free 2008-09-11 21:26 gets close 2008-09-11 21:26 I was going to have separate btree for big xattrs 2008-09-11 21:27 and small ones go inthe inode, just like immediate file data 2008-09-11 21:27 (still imagining a world with just one btree) 2008-09-11 21:27 but mapping intermediate sized attributes into the top of the file address space is a possibility 2008-09-11 21:27 theoretically you can put almost all file metadata at the -1 point 2008-09-11 21:27 not only xattrs 2008-09-11 21:27 thejust one btree idea has already been done, it's called hammer 2008-09-11 21:27 not sure how that would work for performance 2008-09-11 21:28 but you'd get versioning for free 2008-09-11 21:28 I think that two level btree is significantly more cache efficient 2008-09-11 21:28 I've played with mapping file metadata into the file address space before 2008-09-11 21:28 perhaps. 2008-09-11 21:28 without joy 2008-09-11 21:28 spent a lot of mental energy on it, found no real wins 2008-09-11 21:28 where are the problems? 2008-09-11 21:29 finding a reason to do it 2008-09-11 21:29 an example that runs faster 2008-09-11 21:29 yeah, it's probably worth optimizing the hell out of inode stat time 2008-09-11 21:29 stat time? 2008-09-11 21:29 ah 2008-09-11 21:30 yes 2008-09-11 21:30 how fast you can stat a bunch of inodes 2008-09-11 21:30 tux3 is going to work very well there 2008-09-11 21:30 basically just run down the inode table 2008-09-11 21:30 wait a minute, it's a table? not a btree? 2008-09-11 21:30 and the inode table will be intitionally laid out in a clumpy way 2008-09-11 21:30 it's a btree 2008-09-11 21:30 oh, ok. 2008-09-11 21:30 call it a table for historical reasons 2008-09-11 21:31 variable size inodes 2008-09-11 21:31 a tux3 exclusive, maybe 2008-09-11 21:31 really defines the design and implementation 2008-09-11 21:31  2) Refcount all atoms and delete any that fall to zero <- my vote 2008-09-11 21:31 mine too 2008-09-11 21:31 just challenging to do as fast as the crude approach 2008-09-11 21:31 possibly delaying cleanup till unmount, not sure if that would ease up anything though 2008-09-11 21:32 tux3 has the concept of log rollup 2008-09-11 21:32 I'll be posting about that in much more detail over the next week or so 2008-09-11 21:32 it's continuous cleanup 2008-09-11 21:32 doesn't have to be a flurry of cleanup wither at umount or mount 2008-09-11 21:32 or remount after crash even 2008-09-11 21:33 you can actually put it in the btree ;-) 2008-09-11 21:33 why? 2008-09-11 21:33 you want search through it to be efficient - both ways 2008-09-11 21:33 oh right 2008-09-11 21:33 both atom -> string conversion and string -> atom conversion 2008-09-11 21:33 interesting idea 2008-09-11 21:33 oh 2008-09-11 21:33 I thought you meant the log 2008-09-11 21:33 have some reserved btree prefix 2008-09-11 21:34 of course the atom table will be a btree 2008-09-11 21:34 it will be an HTree in facrt 2008-09-11 21:34 fact 2008-09-11 21:34 the log? yeah though about how the log could be in the btree 2008-09-11 21:34 even had some half-baked concept, but didn't think about it long enough to really know if that's worth even thinking about 2008-09-11 21:34 turns out that the deficiencies of HTree that make it tough to implement readdir accurately don't apply at all to the xattr atom use case 2008-09-11 21:35 and htree is just about optimal for that 2008-09-11 21:35 atom->string is just an array, since there's no holes 2008-09-11 21:35 as far as reverse conversion goes... 2008-09-11 21:35 there are two ideas I'm considering 2008-09-11 21:35 one is to use the address of the dirent as the atom number 2008-09-11 21:36 this decreases thedensity of the atom space somewaht 2008-09-11 21:36 huh, how does that work? 2008-09-11 21:36 by a factor of 4 to be precise 2008-09-11 21:36 oh, right, I think I see 2008-09-11 21:36 just look up the dirent and return the offset fromthe beginning of the file as the atom number 2008-09-11 21:36 have the atoms themselves be pointers 2008-09-11 21:36 cute 2008-09-11 21:36 ACTION has to put a different keyboard onthis machine with a better space bar 2008-09-11 21:36 right 2008-09-11 21:37 the other option is to have a reverse lookup table, that points back at the dirents 2008-09-11 21:37 potentially div 4 or something to make em more likely to fit in a byte 2008-09-11 21:37 I favor the second 2008-09-11 21:37 because I like the atoms to be as dense as possible 2008-09-11 21:37 for compression reasons 2008-09-11 21:37 I already took the div4 into account ;-) 2008-09-11 21:37 I'm still not convinced compression of this part of the fs really matters... 2008-09-11 21:38 sure it does 2008-09-11 21:38 atom number field is current 16 bits 2008-09-11 21:38 64K atoms 2008-09-11 21:38 before having to go to a 32 bit atom number 2008-09-11 21:38 that's comfortable 2008-09-11 21:38 -!- stargazr5(~gauravstt@59.95.38.255) has joined #tux3 2008-09-11 21:38 14 bits not so much 2008-09-11 21:38 still 2008-09-11 21:39 could go either way on that 2008-09-11 21:39 Terrible hack: 2008-09-11 21:39 $ getfattr -n user.hash -e text -h --absolute-names -L xhash 2008-09-11 21:39 # file: xhash 2008-09-11 21:39 user.hash="1114234:1191:1219215805:e233bf8dd0415ec9b7fea0193803357c:6325f0060bd5f23cf6ba106fd6500efa76d9bc5e" 2008-09-11 21:39 Storing mtime/md5sum/sha1sum in a xattr for fs recovery ;-) 2008-09-11 21:39 got to decide by midnight ;-) 2008-09-11 21:39 ? 2008-09-11 21:40 so I store the mtime:md5sum:sha1sum of each file on my drive in a xattr for that file 2008-09-11 21:40 I get constant time md5sum calculation on files 2008-09-11 21:40 ease of verifying file integrity 2008-09-11 21:40 cool 2008-09-11 21:40 and I can verify integrity of files in case of fs crash (ie. like when I upgraded to 2.6.27-rc3) 2008-09-11 21:41 I think I like it more than zfs "checksum everything" mentality 2008-09-11 21:41 makes sense to only checksum logically 2008-09-11 21:41 and yes it does need to be regenerated on file modifications, so the newest files lack it 2008-09-11 21:41 sha1 is ok only if you want crytographic verifiability, otherwise it's slower than necessary 2008-09-11 21:42 compare that with my laptop 20mb/s read speed... 2008-09-11 21:42 and it doesn't matter 2008-09-11 21:42 it matters if you're running a server 2008-09-11 21:42 a lot 2008-09-11 21:42 true 2008-09-11 21:42 option? 2008-09-11 21:42 right 2008-09-11 21:42 probably include something like crc64 or whatever cheap 64-bit hash you can find 2008-09-11 21:43 (no idea what a fast good 64-bit hash is nowadays) 2008-09-11 21:43 crc is bad 2008-09-11 21:43 funnels to hell 2008-09-11 21:43 dx_hack_hash is getting closer 2008-09-11 21:43 uses a hacked lfsr idea 2008-09-11 21:43 needs analysis 2008-09-11 21:44 maze, you'd be good at that 2008-09-11 21:44 I think 2008-09-11 21:44 we appear to have scared of everybody else... noone is asking any other questions ;-) 2008-09-11 21:44 yeah 2008-09-11 21:44 analysis of speed? or of hash spread? 2008-09-11 21:44 and they're the ones who actually check in code ;-) 2008-09-11 21:45 got to be careful about that 2008-09-11 21:45 hash spread 2008-09-11 21:45 etc 2008-09-11 21:45 I'm in the middle of a cluster turn up... 2008-09-11 21:45 speed is about optimal 2008-09-11 21:45 I made sure of that 2008-09-11 21:45 well 2008-09-11 21:45 truth be told I could make it much faster 2008-09-11 21:45 hopefully I can at least provide 'inspiration' or something 2008-09-11 21:45 it's meant for hashing short strings with good spread 2008-09-11 21:46 short, very nonrandom strings 2008-09-11 21:46 does a good job of that 2008-09-11 21:46 re: atom refcounting 2008-09-11 21:46 you don't have to sync it to disk really if you are hacky/smart about it 2008-09-11 21:47 I'm going to post the results of my design thinking from the skate earlier 2008-09-11 21:47 really? 2008-09-11 21:47 since you can put it in the log 2008-09-11 21:47 sounds like magic 2008-09-11 21:47 of course 2008-09-11 21:47 planned 2008-09-11 21:47 or I wouldn't have gone this route at all 2008-09-11 21:47 and if the order is right, then it can never get out of sync 2008-09-11 21:47 again, of course 2008-09-11 21:47 and the entire thing should be small enough you can periodically just write out a new copy 2008-09-11 21:47 I've been computing the exact percentages of log bandwdith that will be required ;-) 2008-09-11 21:47 of the entire thing 2008-09-11 21:48 again, of course 2008-09-11 21:48 but we don't 2008-09-11 21:48 we even do that incrementally 2008-09-11 21:48 and: you can afford to lose decrements, since at most the ref counts will be too high 2008-09-11 21:48 and arrange the structure that have to be updated to be close together 2008-09-11 21:48 and compact 2008-09-11 21:48 ah 2008-09-11 21:48 which is kind of dirty... 2008-09-11 21:48 really? 2008-09-11 21:48 way dirty 2008-09-11 21:48 but there is likely something there 2008-09-11 21:49 you can't lose track long term 2008-09-11 21:49 that would be bad 2008-09-11 21:49 -!- tim_dimm(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 21:49 but you can do a false-positive-to-be-tested-later kind of thing 2008-09-11 21:49 I'd guess most fs'es will have 2 dozen or less atoms 2008-09-11 21:49 he tim_dimm 2008-09-11 21:49 welcome back, daddy! 2008-09-11 21:49 hey 2008-09-11 21:49 wassap? 2008-09-11 21:49 well 2008-09-11 21:50 they're doing good 2008-09-11 21:50 we just did episode 2 of tux3 university 2008-09-11 21:50 how'd I do, maze? 2008-09-11 21:50 ah, missed it! 2008-09-11 21:50 keeping your interesting, hit the right level? 2008-09-11 21:50 ok, though I think the first was more action packed 2008-09-11 21:50 enough swear words? too many? 2008-09-11 21:50 hehe 2008-09-11 21:50 well I can easily pick up the pace 2008-09-11 21:50 it's just that, where we were is where the tux3 kernel port willa actully start 2008-09-11 21:51 I wish there was a: these are your primitives, this is how they function, know this and C and data structures and you don't need to know anything else linux specific 2008-09-11 21:51 can shapor put together one of those word clouds for tux3 university? 2008-09-11 21:51 in an ideal world 2008-09-11 21:51 word cloud? 2008-09-11 21:51 ah 2008-09-11 21:51 scrape the logs 2008-09-11 21:51 make a book ;-) 2008-09-11 21:51 you know, a clear definition of interfaces ;-) 2008-09-11 21:51 you know, the more common words are bigger 2008-09-11 21:51 full of swear words 2008-09-11 21:52 and embarrassing stories about certain well known kernel hackers 2008-09-11 21:52 who joined the U? 2008-09-11 21:52 maze, good louck with that 2008-09-11 21:52 bunch of people up there in the channel list 2008-09-11 21:52 good folks 2008-09-11 21:52 I'm seeing a lot of new names on the irc 2008-09-11 21:52 missed natalie tonight 2008-09-11 21:52 yes, it's bulking up 2008-09-11 21:52 so is the subscribe list 2008-09-11 21:52 and so are the checkins 2008-09-11 21:53 can ask for more 2008-09-11 21:53 what's that up to? 2008-09-11 21:53 you may want to have something selinux specific to optimize that straight in the metadata 2008-09-11 21:53 same for acl's 2008-09-11 21:53 tux3 subscribers are over 100 2008-09-11 21:53 a dictionary with a few hundred entries would optimize down all selinux entries on my machine down to 3 bytes 2008-09-11 21:53 MaZe: you refering to the xattrs? 2008-09-11 21:53 maze, I'm waiting for the selinux people to smell the coffee and come tell use what we're doing right/wrong 2008-09-11 21:53 maze, I have reason to believe that will happen soon ;-) 2008-09-11 21:54 problem is: dictionary needs to be dynamic 2008-09-11 21:54 so basically atoms x 2 2008-09-11 21:54 maze, just what I'm thinking 2008-09-11 21:54 read the posts 2008-09-11 21:54 but again, that's kind of vital to be fast 2008-09-11 21:54 you'll see I addressed that specifically 2008-09-11 21:54 you know where I work, you know how much email I get 2008-09-11 21:54 ;-) 2008-09-11 21:54 the last thing I want to do when I'm 'done' with work is read more email 2008-09-11 21:54 the more details post talks about it I think 2008-09-11 21:55 well read it on the job 2008-09-11 21:55 still parsing (in 2nd window) 2008-09-11 21:55 everybody knows sre's have that kind of time ;-) 2008-09-11 21:55 yeah... 2008-09-11 21:55 when there are no data centers burning down 2008-09-11 21:56 it's job-related 2008-09-11 21:56 so xattr's don't need to be fast - (overly fast) - for anything - except for the parts that are actually used within the kernel 2008-09-11 21:56 got to keep the finger on the pulse of sel development 2008-09-11 21:56 ie. acl's and selinux 2008-09-11 21:56 yup 2008-09-11 21:56 got that in my post too 2008-09-11 21:56 selinux is on every file, acl only on special files 2008-09-11 21:56 haven't gotten there obviously yet 2008-09-11 21:57 the xattr interface kind of sucks for efficiency 2008-09-11 21:57 choke point 2008-09-11 21:58 another solution, is to support atoms for the important stuff, and leave the rest as strings 2008-09-11 21:58 that way you compress selinux/acl but leave the user stuff uncompressed 2008-09-11 21:59 don't have to deal with denial of service against atom space attacks 2008-09-11 21:59 kind of best of both worlds 2008-09-11 21:59 possibly allow a superblock list of optimized entries, and a utility (mount time option), to include a new atom 2008-09-11 22:00 that might be both simple (very) and efficient and trivial to implement 2008-09-11 22:00 flips: see the latest list post? 2008-09-11 22:01 -!- ebiederm(~eric@c-24-130-11-59.hsd1.ca.comcast.net) has left #tux3 2008-09-11 22:01 you still have to deal with negative lookups correctly, but that's an easy optimization 2008-09-11 22:01 maze, also an option, yes 2008-09-11 22:01 important = atom 2008-09-11 22:02 exactly 2008-09-11 22:02 negative lookups? 2008-09-11 22:02 and what is actually an atom is specified by the admin during (previous and current) mount 2008-09-11 22:02 konrad,not yet 2008-09-11 22:02 if you have fs with 'selinux' not being an atom 2008-09-11 22:02 and have that written out to disk as a string 2008-09-11 22:03 and then you remount with selinux as atom (so it promotes) 2008-09-11 22:03 then if you lookup selinux atom on a file with it from before the remount you won't find it, unless you search the string entries as well 2008-09-11 22:03 in which case lack of the field, must mean search for the string instead and promote if needed to atom 2008-09-11 22:04 but the atom table is part of the fs, so how does remoutn come into it? 2008-09-11 22:04 the atom table can't be shrunk 2008-09-11 22:04 but can have new entries added via mount options 2008-09-11 22:04 I sort of get it 2008-09-11 22:04 ie. mount -o atomize=selinux -t tux3 /dev/hda3 / 2008-09-11 22:04 would be worth a list post maybe 2008-09-11 22:05 and than at the beginning you atomize (awesome term) the entries you know will be common 2008-09-11 22:05 so the security.selinux 2008-09-11 22:05 anyway, I think the truth is, the refcounting is going to be so efficient that nobody will care about the slight overhead and will love the warm fuzzy feeling of compression 2008-09-11 22:05 the subpieces of security selinux (since it's split in 3 parts) 2008-09-11 22:05 and being able to use long xattr names without penalty 2008-09-11 22:05 refcounting does have issues with quota 2008-09-11 22:06 unless you count xattrs against user quota 2008-09-11 22:06 which you probably should... 2008-09-11 22:06 yes, the refcounting is primarily to address quota 2008-09-11 22:06 my solution has the benefit you don't need refcounts 2008-09-11 22:06 you still get optimal performance for anything that matters 2008-09-11 22:07 - what matters being selected by the admin (and you can compile in a list of default atoms into tux3, being what we grab from selinux in fedora or whatever) 2008-09-11 22:07 do acl's store numeric ids or text strings? 2008-09-11 22:07 (ids = uids/gids) 2008-09-11 22:07 I'd hope numeric... 2008-09-11 22:09 anyway, that way you should be able to store all selinux data straight along with the mtime/ctime in the inode, using up a 32bit int or something like that 2008-09-11 22:12 btw, you're wrong on the ACLs being the most important use of xattr's - selinux is _BY FAR_ 2008-09-11 22:14 maze, could you work it up into a post? 2008-09-11 22:14 btw 2008-09-11 22:14 if you look at man getfacl, you'll see: 2008-09-11 22:14 1: # file: somedir/ 2008-09-11 22:14 2: # owner: lisa 2008-09-11 22:14 3: # group: staff 2008-09-11 22:14 4: user::rwx 2008-09-11 22:14 5: user:joe:rwx #effective:r-x 2008-09-11 22:14 6: group::rwx #effective:r-x 2008-09-11 22:14 7: group:cool:r-x 2008-09-11 22:14 8: mask:r-x 2008-09-11 22:14 9: other:r-x 2008-09-11 22:14 10: default:user::rwx 2008-09-11 22:14 11: default:user:joe:rwx #effective:r-x 2008-09-11 22:14 I hope acl's are binary but I don't know yet 2008-09-11 22:14 12: default:group::r-x 2008-09-11 22:14 13: default:mask:r-x 2008-09-11 22:14 14: default:other:--- 2008-09-11 22:15 from which you'll notice that for permissions you want to fit the standard ugo+-rwx in the inode 2008-09-11 22:15 but also 2008-09-11 22:15 some of the other stuff which isn't per used 2008-09-11 22:15 s/used/user/ 2008-09-11 22:15 I can't tell if the other latest list post is spam or not 2008-09-11 22:15 so what is going to be cool is also doing atoms on the acl bodies 2008-09-11 22:15 which one? 2008-09-11 22:16 the mask on line 8 in particular 2008-09-11 22:16 "sir I want join your proyect" 2008-09-11 22:16 not spam 2008-09-11 22:16 definitely 2008-09-11 22:16 they do some clever spam nowadays 2008-09-11 22:16 directories also need the default ACLs off of lines 10-14 2008-09-11 22:16 konrad, you meant the latest from tero? 2008-09-11 22:16 which I think don't need special handling 2008-09-11 22:17 flips: nanden yen 2008-09-11 22:17 Not Tero 2008-09-11 22:17 yes, the best kind 2008-09-11 22:17 the problem with acl's though is that of course the amount of space they can take up is unbounded 2008-09-11 22:17 I'll respond 2008-09-11 22:17 maze, sure 2008-09-11 22:17 so only atomize the short ones 2008-09-11 22:17 although you can get it down to like 40 bits per entry 2008-09-11 22:18 so acl's don't really need atomization per say 2008-09-11 22:18 selinux needs atomization 2008-09-11 22:18 ah 2008-09-11 22:18 because selinux stores arbitrary strings 2008-09-11 22:18 I didn't realize the distinction 2008-09-11 22:19 it has to do compartments and stuff 2008-09-11 22:19 acl's basically store the above listed fields (user, user:uid, group, group:gid, mask, other) with 3 bits (rwx) and possibly the actual uid/gid (32 bits) 2008-09-11 22:19 that's not an acl as I understand it 2008-09-11 22:20 that is DAC 2008-09-11 22:20 course getfacl is definitive ;-) 2008-09-11 22:20 $ getfattr -d -m . -e text -h --absolute-names -L /etc 2008-09-11 22:20 # file: /etc 2008-09-11 22:20 security.selinux="system_u:object_r:etc_t:s0\000" 2008-09-11 22:20 so there you have an example selinux xattr 2008-09-11 22:20 the system_u object_r and etc_t are going to appear on tens of thousands of files 2008-09-11 22:20 -!- tim_dimm_(~timothyhu@adsl-67-114-40-138.dsl.scrm01.pacbell.net) has joined #tux3 2008-09-11 22:20 and come from a set of like maybe 200 entries on my system 2008-09-11 22:21 it's going to be very satisfying to compress "security.selinux" to 2 bytes 2008-09-11 22:21 so you need to have security.selinux xattr stored as a 32 bit field in the inode 2008-09-11 22:21 storing 10 bits per each of the 3 fields (u/r/t) 2008-09-11 22:21 and having a dictionary for each of those fields 2008-09-11 22:22 [actually 4 bytes for the entire thing most likely] 2008-09-11 22:22 we could build selinux fields directly into tux3 inodes if they're decent 2008-09-11 22:22 all tux3 attributes are optional 2008-09-11 22:22 exactly - you really want to do that 2008-09-11 22:22 so there's no real cost 2008-09-11 22:22 I guess we better do that 2008-09-11 22:22 could you write a post asking for that? 2008-09-11 22:22 and explaining what they are? :-) 2008-09-11 22:22 heh 2008-09-11 22:22 I'd need to run some stats gathering on my local system 2008-09-11 22:23 sure 2008-09-11 22:23 that's a yes I take it 2008-09-11 22:23 ACTION considers the ramyun deficiency problem 2008-09-11 22:24 ok, generating xattr dump from my machine 2008-09-11 22:24 I'll write something up 2008-09-11 22:24 about selinux/acl/other xattrs 2008-09-11 22:24 and what's important 2008-09-11 22:24 kay, I'll get some ranyun into me then prototype the refcounting 2008-09-11 22:24 and include something about my md5/sha1 idea above into it 2008-09-11 22:24 yah 2008-09-11 22:24 to point out why you want it to be user extensible 2008-09-11 22:25 nice 2008-09-11 22:25 for selinux you can even refuse to accept stuff from outside the dictionary 2008-09-11 22:26 although to be fair that's only settable by root, so might as well just transparently extend the dicts 2008-09-11 22:29 we can have some fun 2008-09-11 22:29 letting security folks play with stuff 2008-09-11 22:29 btw, here's a file with both selinux attrs, and extended acls (extra read rights to local user) 2008-09-11 22:29 $ getfattr -d -m . -e text -h --absolute-names -L junk 2008-09-11 22:29 # file: junk 2008-09-11 22:29 security.selinux="unconfined_u:object_r:default_t:s0\000" 2008-09-11 22:29 system.posix_acl_access="\002\000\000\000\001\000\007\000\377\377\377\377\002\000\004\000d\000\000\000\004\000\005\000\377\377\377\377\020\000\005\000\377\377\377\377 \000\005\000\377\377\377\377" 2008-09-11 22:29 Would you have guessed? 2008-09-11 22:30 everybody needs to have fun 2008-09-11 22:30 ah 2008-09-11 22:30 looks binary 2008-09-11 22:30 notice how 'local' (a user name) doesn't show up 2008-09-11 22:30 very 2008-09-11 22:30 instead the uid (100) does 2008-09-11 22:31 where is it? 2008-09-11 22:31 100 = 0144 = 'd' 2008-09-11 22:32 (there's extra stuff there that always gets set on any file with extended acls... basically cruft...) 2008-09-11 22:32 there sure are a lot of 377s 2008-09-11 22:32 it looks like it uses 32-bit ints 2008-09-11 22:33 probably for the u/gids 2008-09-11 22:33 wow, that d is really well hidden 2008-09-11 22:33 hehe 2008-09-11 22:33 bad choice of username 2008-09-11 22:33 crappy dump 2008-09-11 22:33 see hexdump.c 2008-09-11 22:34 yeah ;-), well I asked for text 2008-09-11 22:34 $ getfattr -d -m . -e hex -h --absolute-names -L junk 2008-09-11 22:34 # file: junk 2008-09-11 22:34 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a64656661756c745f743a733000 2008-09-11 22:34 system.posix_acl_access=0x0200000001000700ffffffff020004006400000004000500ffffffff10000500ffffffff20000500ffffffff 2008-09-11 22:34 still not readable - but better 2008-09-11 22:34 just installed the acl package 2008-09-11 22:34 I will try to clue up a bit 2008-09-11 22:34 the fs has to be mounted with acl support 2008-09-11 22:35 basically all you care about are getfattr/setfattr for xattr mods 2008-09-11 22:35 tux3 will always mount it xattr support, anyway 2008-09-11 22:35 and getfacl/setfacl for acl stuff 2008-09-11 22:35 I wonder that acl support is 2008-09-11 22:35 extra module? 2008-09-11 22:35 to actually respect extended acls stored as xattrs 2008-09-11 22:35 $ cat /proc/mounts 2008-09-11 22:35 /dev/root / ext3 rw,relatime,errors=continue,user_xattr,acl,data=ordered 0 0 2008-09-11 22:35 notice user_xattr,acl 2008-09-11 22:36 I wonder what acl does 2008-09-11 22:36 user xattr allows setting user.* xattrs (not selinux nor acl) 2008-09-11 22:36 could always read the fscking source 2008-09-11 22:36 hehe 2008-09-11 22:36 man 5 attr 2008-09-11 22:36 ah, well tux3 will yet users set xattrs by default 2008-09-11 22:36 man 5 acl 2008-09-11 22:36 no reason not to 2008-09-11 22:37 I meant, what does it do in ext3 mount 2008-09-11 22:37 uhm, I think that's actually a vfs switch - so you have no choice ;-) 2008-09-11 22:37 it seems strange you'd have to ask for it 2008-09-11 22:37 notice you don't have to ask for htree ;-) 2008-09-11 22:37 I think it's the default nowadays 2008-09-11 22:37 that's a pretty big deal 2008-09-11 22:37 not sure though 2008-09-11 22:37 what is? 2008-09-11 22:37 not having to ask for htree 2008-09-11 22:38 you can mount -o remount,acl/noacl 2008-09-11 22:38 you get dir indexing by default 2008-09-11 22:38 even though it's horribly complex 2008-09-11 22:38 you've lost me - how does user_xattr and acl correspond to htree/dir indexing? 2008-09-11 22:38 doesn't 2008-09-11 22:38 ah 2008-09-11 22:38 just talking about defaults 2008-09-11 22:39 it makes no sense you'd have to ask for xattr or acl 2008-09-11 22:39 if you want to prevent you users from mucking around with it 2008-09-11 22:39 why would you? 2008-09-11 22:39 I believe the reason is 2008-09-11 22:40 (mind you both are ext3 options, not vfs I believe) 2008-09-11 22:40 true 2008-09-11 22:40 it requries a newer version of the ext3 superblock 2008-09-11 22:40 so does htree 2008-09-11 22:40 so you need it for backward compatibility 2008-09-11 22:40 ah 2008-09-11 22:40 that is the difference 2008-09-11 22:40 htree was forward compatible 2008-09-11 22:40 in case you don't want to generate xattrs on the fs 2008-09-11 22:40 right 2008-09-11 22:40 now it makes sense 2008-09-11 22:40 well 2008-09-11 22:40 tux3 doesn't have that problem 2008-09-11 22:40 forward? or backward? 2008-09-11 22:41 backward 2008-09-11 22:41 there is no backward, but of course we need to plan for forward 2008-09-11 22:41 could an old system r/w a newer fs with data stored in htreE? 2008-09-11 22:41 yes 2008-09-11 22:41 cute, no? 2008-09-11 22:41 very tricky to make that happen 2008-09-11 22:41 ah, ok, then clearly need no option if it's better 2008-09-11 22:41 cute - yes! 2008-09-11 22:41 right, it was never worse, that was another cute thing 2008-09-11 22:41 wicked. 2008-09-11 22:42 because it would fall back to _exactly_ the old code at the crossover point 2008-09-11 22:42 hehe 2008-09-11 22:42 which turned out to be two dirent blocks 2008-09-11 22:42 at two blocks htree was already faster 2008-09-11 22:42 so it just creates the index when the first block overfills 2008-09-11 22:43 and at that point that's still a cheap op 2008-09-11 22:44 right 2008-09-11 22:44 htree is really fast 2008-09-11 22:45 dirops measured in tens of usec, even back then 2008-09-11 22:46 ugh, I wish I could do some coding... some real low level put-nose-in-the-deep low-level hackery 2008-09-11 22:47 you can, after your post ;-) 2008-09-11 22:47 that in itself will take a while to write 2008-09-11 22:48 it's job-related 2008-09-11 22:49 good security keeps data centers from catching fire 2008-09-11 22:50 uhm, not so sure about that ;-) 2008-09-11 22:50 they catch fire for totally non-security related reasons 2008-09-11 22:51 ok, _sometimes_ keeps data centers from catching fire 2008-09-11 22:51 could potentially, one day, keep a fire from starting 2008-09-11 22:52 you know alan cox figure out how to remotely disable the temperture override on intel processors? 2008-09-11 22:52 he could literally melt down processors remotely 2008-09-11 22:52 let me see 2008-09-11 22:53 make sure that isn't apocryphal 2008-09-11 22:53 pretty sure not 2008-09-11 22:55 heh 2008-09-11 22:56 I think our machines would lose power at that point, although I'm not sure 2008-09-11 22:56 care to let him try? ;-) 2008-09-11 22:56 it's rather odd that you can do that in software 2008-09-11 22:57 you'd think: overheat is overheat 2008-09-11 22:57 there should be no gate on it 2008-09-11 22:57 I think they have gates on stuff like that for a simple reason 2008-09-11 22:58 they don't know where the overheat point is until after they've built and tested the cpu 2008-09-11 22:58 some batches are better than others 2008-09-11 22:58 those get sold as higher frequency cpus, and or with more cache 2008-09-11 22:59 today faster/more expensive/more top of the line cpus are cpus with less broken parts 2008-09-11 22:59 less of the cache disabled - because it didn't work, less power consumption at higher speed, less heat generated, better freqeuency tolerances etc 2008-09-11 23:00 less alu units disabled (there are always spares) 2008-09-11 23:00 less cores disabled 2008-09-11 23:00 already the 486sx was a dx with the floating point unit disabled because it failed qa 2008-09-11 23:01 -!- flips(~phillips@phunq.net) has left #tux3 2008-09-11 23:01 -!- flips(~phillips@phunq.net) has joined #tux3 2008-09-11 23:01 creating some excellent virus payload opportunities 2008-09-11 23:01 huh? 2008-09-11 23:02 course, it's usually not in the interest of a virus to physicall destroy its host 2008-09-11 23:02 oh, as in burn the cpu virus? 2008-09-11 23:02 right, a real HACF instruction 2008-09-11 23:02 spread, mutate, exterminate later? 2008-09-11 23:02 HACF? 2008-09-11 23:02 Halt And Catch Fire 2008-09-11 23:02 $ cat /tmp/xattr.se | wc -l 2008-09-11 23:02 383 2008-09-11 23:03 so I have 383 different security.selinux xattr entries on my laptop 2008-09-11 23:03 highly compressable if that's what you're syaing 2008-09-11 23:03 yup 2008-09-11 23:03 MaZe: how'd you count? 2008-09-11 23:04 sort | uniq -c | wc -l 2008-09-11 23:04 er, before that bit 2008-09-11 23:04 this is going to hurt your eyes ;-) 2008-09-11 23:04 find / | xargs getxattr (something)? 2008-09-11 23:04 find / -xdev -print0 | xargs -0 -n 1 getfattr -d -m . -e text -h --absolute-names -L | egrep '^security\.selinux=' | sort | uniq -c | wc -l 2008-09-11 23:04 thanks 2008-09-11 23:05 relatively unpainful line of shell 2008-09-11 23:05 I actually dumped to file in their 2008-09-11 23:05 no perl or awk ;-) 2008-09-11 23:05 so hope I didn't mix that up 2008-09-11 23:05 don't know perl 2008-09-11 23:05 avoid awk 2008-09-11 23:05 abuse egrep and sed instead 2008-09-11 23:05 sed is good for some nice chicken tracks 2008-09-11 23:05 base64 to base16 conversion in sed anyone? 2008-09-11 23:06 ouch, really? 2008-09-11 23:06 or the other way? 2008-09-11 23:06 or base32? 2008-09-11 23:06 or to binary? 2008-09-11 23:06 really ;-) 2008-09-11 23:06 ow :) 2008-09-11 23:07 (obviously you go through binary) 2008-09-11 23:08 so I split up my selinux strings, here's the results (there's 4 : seperated pieces) 2008-09-11 23:08 $ cat /tmp/xattr.se1 2008-09-11 23:08 679862 system_u 2008-09-11 23:08 24514 unconfined_u 2008-09-11 23:08 $ cat /tmp/xattr.se2 2008-09-11 23:08 704376 object_r 2008-09-11 23:08 $ cat /tmp/xattr.se3 | wc -l 2008-09-11 23:08 356 2008-09-11 23:08 $ cat /tmp/xattr.se4 2008-09-11 23:08 704376 s0 2008-09-11 23:09 so, yeah, badly needs a dict 2008-09-11 23:09 how about base63 to base15? 2008-09-11 23:09 just one less 2008-09-11 23:09 lol 2008-09-11 23:09 how hard could that be 2008-09-11 23:10 ah, but how useful would that be? 2008-09-11 23:10 ACTION feels that way about base64 2008-09-11 23:11 base64 is nice and concise, when you still want something printable (ie. a filename) 2008-09-11 23:11 but doesn't need to be remembered (still needs to be cut'n'pasteable though) 2008-09-11 23:11 see, I just didn't arrive on the planet as a sysop 2008-09-11 23:11 so yeah files with names being base64 encoding of content hash... 2008-09-11 23:12 useful stuff ;-) 2008-09-11 23:12 good thing lots of sysops like to hang around filesystem projects 2008-09-11 23:12 makes for some fancy scripts 2008-09-11 23:12 $ cat /tmp/xattr.se3 | sort -n | tail -n 8 2008-09-11 23:12 6125 modules_object_t 2008-09-11 23:12 13658 locale_t 2008-09-11 23:12 16690 man_t 2008-09-11 23:12 25337 src_t 2008-09-11 23:12 26456 lib_t 2008-09-11 23:12 32875 user_home_t 2008-09-11 23:12 109266 usr_t 2008-09-11 23:12 461436 default_t 2008-09-11 23:13 so you can see the long tail distribution of the 3rd element 2008-09-11 23:13 that's with all the _t ? 2008-09-11 23:13 type 2008-09-11 23:13 right, which I associate with source code 2008-09-11 23:13 or somit 2008-09-11 23:13 is that source? 2008-09-11 23:13 u = user 2008-09-11 23:13 r = role 2008-09-11 23:14 t = type 2008-09-11 23:14 ah 2008-09-11 23:14 parts of the tripplet have different suffixes 2008-09-11 23:14 it's basically always .*_u:.*_r:.*_t 2008-09-11 23:14 let me see, that's MAC terminology 2008-09-11 23:14 not surprising there... 2008-09-11 23:14 not sure what exactly the last quad (s0) is 2008-09-11 23:14 possibly range? 2008-09-11 23:15 anyway, on a bigger box, there would obviously be more 2008-09-11 23:15 but we're not talking about a lot here - just the order of hundreds to thousands 2008-09-11 23:16 instructive 2008-09-11 23:16 I'm going to have to aborb this as we go 2008-09-11 23:16 but anyway 2008-09-11 23:16 I think the last quad isn't used for anything yet but is reserved for future stuff? 2008-09-11 23:16 seem to be on the right track, just by dumb luck 2008-09-11 23:16 maybe, it's newer than the rest 2008-09-11 23:16 err 2008-09-11 23:16 no, that's not right 2008-09-11 23:16 hold on 2008-09-11 23:17 oh, I know 2008-09-11 23:17 if user sets a selinux context which isn't atomized - he gets eperm, if root, the atom table is auto-extended 2008-09-11 23:18 deals with dos correctly 2008-09-11 23:18 or make it a mount option, selinux-auto-atomize={always,if-root,never} 2008-09-11 23:21 atomize is the new tux3 thing? 2008-09-11 23:24 215 different selinux states here 2008-09-11 23:25 :-) 2008-09-11 23:25 well atomize is descriptive in what it does to the number of bytes 2008-09-11 23:25 heh 2008-09-11 23:25 what is the average length of an acl? 2008-09-11 23:26 and what is the percentage of files that have them? 2008-09-11 23:26 selinux? all 2008-09-11 23:26 extended acl? almost none 2008-09-11 23:26 avg length of selinux acl... working 2008-09-11 23:27 security.selinux="...:...:...:..\000"[newline] - avg at 53 2008-09-11 23:27 so like 48 in memory 2008-09-11 23:28 extended acl: 2008-09-11 23:28 system.posix_acl_acces="..." with minimum length of 2008-09-11 23:29 so we will compress those about 25/1 2008-09-11 23:29 61 or so 2008-09-11 23:29 30/1 then 2008-09-11 23:30 minimum length 2008-09-11 23:30 oh 2008-09-11 23:30 well 2008-09-11 23:30 well selinux is everywhere - and compressed to 4 bytes 2008-09-11 23:30 "significant" metadata compression coming up 2008-09-11 23:30 acl is not everywhere... 2008-09-11 23:30 ah 2008-09-11 23:30 so only 2/1 2008-09-11 23:30 when we compress the 4 bytes to 2 byte atoms 2008-09-11 23:30 in a selinux system everyfile has to have a selinux context 2008-09-11 23:30 but not every file has to have extended acls 2008-09-11 23:31 I see 2008-09-11 23:31 so it depends on how you do the selinux compression 2008-09-11 23:31 how badly you want to compress it 2008-09-11 23:31 look up every xattr body in the atom table 2008-09-11 23:31 easy 2008-09-11 23:31 the most relaxed method uses 16 bytes 2008-09-11 23:31 -!- stargazr5(~gauravstt@59.95.38.255) has joined #tux3 2008-09-11 23:31 the most compressed 2 bytes 2008-09-11 23:32 I see 2008-09-11 23:32 varying levels of extendability 2008-09-11 23:32 future-proofing 2008-09-11 23:32 you could do selinux compression like standard unix priveleges compression 2008-09-11 23:32 yes, but how big do you make the bitfields? 2008-09-11 23:32 i.e. if it's the same as the parent directory, don't store it in the inode 2008-09-11 23:32 so they've already done a pretty good job of compressing bodies 2008-09-11 23:32 it's the xattr labels that will stick out 2008-09-11 23:32 you don't need the xattr labels at all 2008-09-11 23:33 the selinux and acl xattr labels are trivially obvious well known labels, that you fake on xattr access 2008-09-11 23:33 konrad, not there's no way to find the partent directory actually 2008-09-11 23:33 so if we do that, it will be the containing inode table block 2008-09-11 23:33 or region 2008-09-11 23:34 "note there's no way to find the parent directory actually 2008-09-11 23:34 " 2008-09-11 23:34 so you don't need to store them as xattrs anyway 2008-09-11 23:34 hardlinks 2008-09-11 23:34 hard to type and eat ramyun at the same time 2008-09-11 23:34 inode can exist in multiple dirs 2008-09-11 23:35 time to do my atom refcount post 2008-09-11 23:35 I'll post the design, then implement 2008-09-11 23:35 like a good boy should 2008-09-11 23:36 also, note that xattrs 2008-09-11 23:36 basically come in a couple variaties 2008-09-11 23:36 security.* system.* trusted.* user.* 2008-09-11 23:37 so it's worthwhile to compress {security|system|trusted|user} regardless 2008-09-11 23:37 flips: er, sorry, yes, that. 2008-09-11 23:38 security is basically for selinux (& other such systems), system for extended acls (& other such system - capabilities), user for users, trusted (accessible to CAP_SYS_ADMIN) 2008-09-11 23:39 ACTION trashes the latest lucasarts game on slashdot 2008-09-11 23:39 got to keep focussed here 2008-09-11 23:39 what's the game? 2008-09-11 23:39 I just received sport 2008-09-11 23:39 erm, spore 2008-09-11 23:40 spore is on my never play list 2008-09-11 23:40 with the drm 2008-09-11 23:40 even if they're doing it to windows boxes 2008-09-11 23:40 there's drm? 2008-09-11 23:40 hmm, well I have a mac 2008-09-11 23:40 it fscks with the registry 2008-09-11 23:40 for copy protection 2008-09-11 23:41 hacks the os security 2008-09-11 23:41 details are all over the web 2008-09-11 23:41 not that hacking windows security is all that leet... still it's agin the law 2008-09-11 23:42 hmm, all my windoze are safely in kvm cages 2008-09-11 23:42 you having fun with spore? 2008-09-11 23:42 makes for the fastest windows installs ever 2008-09-11 23:43 haven't launched it yet ;-) 2008-09-11 23:43 I had fun with civ revolutions 2008-09-11 23:43 never played a civ game before 2008-09-11 23:43 will need to reboot into mac 2008-09-11 23:43 surprised me, I didn't think I'd like it 2008-09-11 23:43 well 2008-09-11 23:43 post coming, for realz 2008-09-11 23:46 man setxattr: If extended attributes are not supported by the filesystem, or are disabled, errno is set to ENOTSUP. 2008-09-11 23:53 -!- kd(kdpict@118.94.53.35) has joined #tux3 2008-09-11 23:54 yah 2008-09-11 23:54 just replied