<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2493250806093724011</id><updated>2011-12-20T15:07:10.185-06:00</updated><title type='text'>Backup Storage System</title><subtitle type='html'>My thoughts, and eventual code, concerning a Database centric backup system.  The database would track all backups at the file level, keeping a copy of all files needed to do a complete restore of a system, or a subset of those files if desired.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>8</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-1263381906389701946</id><published>2010-01-15T13:27:00.002-06:00</published><updated>2010-01-15T13:31:38.337-06:00</updated><title type='text'>BSM Progress</title><content type='html'>It has been a while since I posted anything here.  I lost my code in an upgrade mishap, so I started over.  I changed the design of the system to do inline deduplication on blocks coming in to the sysetm.  I have not gotten far enough yet to see how the deduplication works and how much space it saves, but that is coming soon.  I have the client talking to the server, and that is a start.&lt;br /&gt;&lt;br /&gt;I am reading Code Complete by Steve McConnell to get some good ideas on coding style.  I have a couple of more books coming, because I realize after starting this that while I know a little bit about C language programming, I do not know enough about making it readable and chunking it up into smaller pieces to make it easier to manage.&lt;br /&gt;&lt;br /&gt;If anyone has suggestions about books or pdf's that can help in this area, please let me know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-1263381906389701946?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/1263381906389701946/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=1263381906389701946&amp;isPopup=true' title='35 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/1263381906389701946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/1263381906389701946'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2010/01/bsm-progress.html' title='BSM Progress'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>35</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-1851139823017602292</id><published>2009-12-09T16:16:00.005-06:00</published><updated>2009-12-09T17:22:27.041-06:00</updated><title type='text'>Linux Filesystem Benchmarks</title><content type='html'>I wanted to see how various filesystems in Linux stacked up to each other.  So, I decided to benchmark them.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The filesystems I am benchmarking are:  ext3, ext4, xfs, reiserfs, btrfs, and nilfs2.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The system I am using to do the benchmarking:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Gigabyte MA790FXT-UD5P Motherboard&lt;/div&gt;&lt;div&gt;AMD Phenom II 955 Quad Core 3.2GHz&lt;/div&gt;&lt;div&gt;4 x Mushkin Silverline PC3-10666 9-9-9-24 1333 2GB Ram&lt;/div&gt;&lt;div&gt;Gigabyte GV-N210OC-512I Geforce 210 512MB 64-bit PCIE Video Card&lt;/div&gt;&lt;div&gt;LG 22x DVD Sata Burner&lt;/div&gt;&lt;div&gt;2 x WD Caviar Blue 320GB 7200RPM Sata Drives (OS/Other)&lt;/div&gt;&lt;div&gt;4 x WD Caviar Blue 80GB 7200RPM Sata Drives (Data)&lt;/div&gt;&lt;div&gt;4 x Patriot Memory 32GB Sata SSD (Database)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Gentoo Linux 10.1&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The diskspace used is a software raid 0, comprised of 4 partition slices of 4864 cylinders (37.3GB) from the 80GB Hard Drives.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I used the fileio benchmarks in Sysbench 0.4.10 to do these tests.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I created a script that formats the filesystem, mounts it, runs the sysbench prepare statement, clears the filecache, and runs the benchmark.  This is done 5 times for each filesystem - I/O Operation Mode tested, and averaged.  Each filesystem is created with it's default values.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;SEQWR&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;SEQWR is the Sequential Write Benchmark.&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/seqwr.jpg" alt="SEQWR" /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;XFS is the clear winner here, with ext4 following closely.  NILFS2 was really bad, but I have to attribute this to the newness of it, and that it's not production ready.  It performed poorly in every test except one notably weird exception which I will discuss later (SEQRD).  So ignoring NILFS2, JFS was the worst at 2.4x the best.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;SEQREWR&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;SEQREWR is the sequential rewrite benchmark&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/seqrewr.jpg" alt="SEQREWR" /&gt;&lt;/div&gt;&lt;div&gt;JFS just beat out XFS on this one, with EXT3 having a particularly bad showing here at 7x the best.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;SEQRD&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;SEQRD is the sequential read benchmark&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/seqrd.jpg" alt="SEQRD" /&gt;&lt;/div&gt;&lt;div&gt;I cannot explain why NILFS2 chose this test to shine on, but I suspect either it is something I am missing, or that NILFS2 is just really good at this.  All the rest of the filesystems were virtually equal here.  If your workload is sequential read, it seems any filesystem would do.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;RNDRD&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RNDRD is the random read benchmark&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/rndrd.jpg" alt="RNDRD" /&gt;&lt;/div&gt;&lt;div&gt;The winner here is REISERFS, with JFS and EXT3 close behind.  XFS was the worst at 2.8x the best.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;RNDWR&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RNDWR is the random write benchmark&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/rndwr.jpg" alt="RNDWR" /&gt;&lt;/div&gt;&lt;div&gt;I can't explain the fantastic showing here on REISERFS, unless it buffers the write and returns without having sync'd it to disk.  Bypassing that, EXT3 showed well here, followed by BTRFS and JFS.  EXT4 was the worst at 3.8x the best.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;RNDRW&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RNDRW is the combined random read/write benchmark&lt;/div&gt;&lt;div&gt;&lt;img src="http://www.agcbsm.org/b20091209/rndrw.jpg" alt="RNDRW" /&gt;&lt;/div&gt;&lt;div&gt;REISERFS won here, followed closely by JFS.  The loser here is XFS at 4.9x the best.&lt;/div&gt;&lt;div&gt;&lt;hr /&gt;&lt;/div&gt;&lt;div&gt;Conclusion: REISERFS and JFS are pretty close contenders for first place, followed by BTRFS and EXT4.  Good old EXT3 would be my pick for fifth, leaving XFS and the still immature, but interesting log based filesystem NILFS2 in last place.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As always, your mileage may vary.  I could have done some tuning, as some of the filesystems have parameters for stride and stripe width for raid devices, but once I started tuning, I wouldn't know where to stop, so I thought it was more fair to compare them with their default values.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I plan on testing these same filesystems and I/O patterns  on SSD disks next.  Also, I am going to test BTRFS compression, but not sure yet if that is interesting enough to post about. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-1851139823017602292?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/1851139823017602292/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=1851139823017602292&amp;isPopup=true' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/1851139823017602292'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/1851139823017602292'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2009/12/linux-filesystem-benchmarks.html' title='Linux Filesystem Benchmarks'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-133652020653675226</id><published>2009-11-30T09:13:00.004-06:00</published><updated>2009-11-30T10:50:47.986-06:00</updated><title type='text'>ZFS on SAN - multiple slices from the same raid group?</title><content type='html'>&lt;span style="font-family:courier new;"&gt;I have never had much to do with Solaris, until recently.   I have heard a lot about ZFS, and it's many advantages, but when I read about the upcoming deduplication in ZFS, I had to try it out.  I talked to some people here more familiar with ZFS than me, but the indication from them was that they were getting away from ZFS.  High I/O rates were bringing it to it's knees.  We are using an EMC SAN here, and while reading about ZFS it occurred to me that what may be happening is that no one planned out the raid groups so that each slice of the SAN came from separate raid groups.  I decided to test this out.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;I did not have a spare system to run OpenSolaris on, and I needed a system with more SATA ports so I could have more hard disk drives.  Here is what I came up with:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Gigabyte MA790FXT-UD5P Motherboard&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;AMD Phenom II 955 Quad Core 3.2GHz&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;4 x Mushkin Silverline PC3-10666 9-9-9-24 1333 2GB Ram&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Gigabyte GV-N210OC-512I Geforce 210 512MB 64-bit PCIE Video Card&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;LG 22x DVD Sata Burner&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;2 x WD Caviar Blue 320GB 7200RPM Sata Drives (OS/Other)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;4 x WD Caviar Blue 80GB 7200RPM Sata Drives (Data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;4 x Patriot Memory 32GB Sata SSD (Database)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Gentoo Linux 10.1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;OpenSolaris 200906&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;I decided to test this theory with a 4864 cylinder partition on one of the 80GB drives, and two 2432 cylinder partions on another 80GB drive.  Here are the two pools:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   pool: storage&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt; state: ONLINE&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt; scrub: none requested&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;config:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        NAME        STATE     READ WRITE CKSUM&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        storage     ONLINE       0     0     0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;          c7t1d0p3  ONLINE       0     0     0&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  pool: storages&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt; state: ONLINE&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt; scrub: none requested&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;config:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        NAME        STATE     READ WRITE CKSUM&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        storages    ONLINE       0     0     0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;          c7t0d0p3  ONLINE       0     0     0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;          c7t0d0p4  ONLINE       0     0     0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;I decided to use iozone as the test.  I was particular interested in the filesizes of 8GB and 16GB, since I have 8GB of memory and wanted to take filecache out of the picture as much as possible.  The results were a little surprising:&lt;br /&gt;&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;One Partition&lt;/td&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;Two Partitions&lt;/td&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;8GB Filesize&lt;/td&gt;&lt;td&gt;16GB Filesize&lt;/td&gt;&lt;td&gt;8GB Filesize&lt;/td&gt;&lt;td&gt;16GB Filesize&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Writer Report (avg)&lt;/td&gt;&lt;td&gt;53484KB/s&lt;/td&gt;&lt;td&gt;49143KB/s&lt;/td&gt;&lt;td&gt;49781KB/s&lt;/td&gt;&lt;td&gt;45725KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Reader Report (avg)&lt;/td&gt;&lt;td&gt;38038KB/s&lt;/td&gt;&lt;td&gt;36979KB/s&lt;/td&gt;&lt;td&gt;98214KB/s&lt;/td&gt;&lt;td&gt;45676KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Random Read Report (avg)&lt;/td&gt;&lt;td&gt;38869KB/s&lt;/td&gt;&lt;td&gt;12755KB/s&lt;/td&gt;&lt;td&gt;24957KB/s&lt;/td&gt;&lt;td&gt;11710KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Random Write Report (avg)&lt;/td&gt;&lt;td&gt;30250KB/s&lt;/td&gt;&lt;td&gt;25258KB/s&lt;/td&gt;&lt;td&gt;27607KB/s&lt;/td&gt;&lt;td&gt;23635KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Stride Read Report (avg)&lt;/td&gt;&lt;td&gt;76695KB/s&lt;/td&gt;&lt;td&gt;31510KB/s&lt;/td&gt;&lt;td&gt;57693KB/s&lt;/td&gt;&lt;td&gt;34278KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Fwrite Report (avg)&lt;/td&gt;&lt;td&gt;53784KB/s&lt;/td&gt;&lt;td&gt;48968KB/s&lt;/td&gt;&lt;td&gt;29115KB/s&lt;/td&gt;&lt;td&gt;28515KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;Fread Report (avg)&lt;/td&gt;&lt;td&gt;68970KB/s&lt;/td&gt;&lt;td&gt;49275KB/s&lt;/td&gt;&lt;td&gt;98283KB/s&lt;/td&gt;&lt;td&gt;45997KB/s&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;Here is what jumps out at me:&lt;br /&gt;&lt;br /&gt;- Performance for sequential writing with 2 partitions versus 1 partition was approximately 7% slower&lt;br /&gt;&lt;br /&gt;- 2 partitions vs 1 partition for the sequential read was a winner with about 2.5x performance on the 8GB filesize, and about 23.5% increase for 16GB&lt;br /&gt;&lt;br /&gt;- Random Read for the 8GB filesize showed almost a 37% decrease in performance, while the 16GB filesize only showed about an 8% devrease&lt;br /&gt;&lt;br /&gt;- Random Write showed a 8% decrease in performance for the 8GB filesize, vs 6% for the 16GB filesize.&lt;br /&gt;&lt;br /&gt;- Stride Read showed the opposite of Sequential Read, with the 8GB filesize showed a 25% with 1 partition vs 2, maybe because the stride read masked the OS filecache?&lt;br /&gt;&lt;br /&gt;- Fwrite showed almost a 2 to 1 advantage for the 1 partition configuration&lt;br /&gt;&lt;br /&gt;- Fread showed a 50% increase for 2 partition at 8GB filesize, but negligible difference at 16GB filesize.&lt;br /&gt;&lt;br /&gt;I am guessing that some of the advantages in sequential read has to do with the OS doing more readahead for two "disks", as the OS sees it vs one "disk".&lt;br /&gt;&lt;br /&gt;This test was not exhaustive, and did not include multiple readers and writers, but my gut feel is to stay away from multiple slices in one raid group for writers, and still stay away from it with readers, but for sequential reads, tweak the OS to balance the readahead for better read performance.&lt;br /&gt;&lt;br /&gt;I am planning to do more benchmarks on ZFS, with SSD disk and spinning disk, with record sizes, compression, and deduplication when it is available.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-133652020653675226?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/133652020653675226/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=133652020653675226&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/133652020653675226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/133652020653675226'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2009/11/zfs-on-san-multiple-slices-from-same.html' title='ZFS on SAN - multiple slices from the same raid group?'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-7524728766376572873</id><published>2009-04-06T15:14:00.004-05:00</published><updated>2009-04-06T15:22:57.220-05:00</updated><title type='text'>Performance Problems</title><content type='html'>I am having MySQL performance problems with my current schema.  Here is a rundown of what the inserter does:&lt;br /&gt;&lt;br /&gt;- get's the filename, filesize and hash for a file&lt;br /&gt;- checks the objects database to see if that combination already exists&lt;br /&gt;-     if it exists, skip to next step&lt;br /&gt;-     if it does not exisit, insert it&lt;br /&gt;- insert the information in the file table&lt;br /&gt;&lt;br /&gt;Here is the create table definition for the tables:&lt;br /&gt;&lt;br /&gt;CREATE TABLE `objects` (&lt;br /&gt; `objectid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,&lt;br /&gt; `filename` varchar(256) COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; `filesize` bigint(20) unsigned DEFAULT NULL,&lt;br /&gt; `hash` varchar(32) COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; PRIMARY KEY (`objectid`),&lt;br /&gt; UNIQUE KEY `nsh` (`filename`,`filesize`,`hash`)&lt;br /&gt;) ENGINE=MyISAM AUTO_INCREMENT=7037849 DEFAULT CHARSET=latin1 COLLATE=latin1_bin&lt;br /&gt;&lt;br /&gt;CREATE TABLE `files` (&lt;br /&gt; `path` varchar(4096) COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; `filename` varchar(256) COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; `filesize` bigint(20) unsigned DEFAULT NULL,&lt;br /&gt; `hash` varchar(32) COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; `backuptime` datetime DEFAULT NULL,&lt;br /&gt; `status` enum('Active','Inactive','Deleted') COLLATE latin1_bin DEFAULT NULL,&lt;br /&gt; `objectid` bigint(20) unsigned DEFAULT NULL&lt;br /&gt;) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin&lt;br /&gt;&lt;br /&gt;I currently have a files table per backup client node.  On my test system, I can run the inserts sequentially for 15 nodes (37 or so processes for multiple filesystem clients), and it runs in about 50 minutes.  When i run them at the same time, it runs many times longer.  It looks like it is the objects table, since I tried "insert delayed" and that cut the time back down to around the sequential time.  The downside to the delayed, is that a crash can lose data.&lt;br /&gt;&lt;br /&gt;Does anyone have any ideas as to what I am doing wrong?  If it matters, I am running a dual core AMD 5800+, 4GB memory, and a 4 disk raid-0 SSD Array.&lt;br /&gt;&lt;br /&gt;Thanks for any input anyone may have.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-7524728766376572873?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/7524728766376572873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=7524728766376572873&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/7524728766376572873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/7524728766376572873'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2009/04/performance-problems.html' title='Performance Problems'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-4500326200235102688</id><published>2009-04-06T15:08:00.003-05:00</published><updated>2009-04-06T15:13:41.445-05:00</updated><title type='text'>Initial Stats from 15 Servers</title><content type='html'>I picked 15 servers at my workplace to get some initial data on.  I tried to pick different servers, not multiple copies of the same servers, like for high availability, etc.  The data I gathered gave a fairly good poor man's data-dedup savings. &lt;br /&gt;&lt;br /&gt;My "poor man's dedup" consists of saving the backup files based on filename/filesze/md5 hash of the file contents.  This provided savings, even on single servers, as there are some number of the same named files on the same system, with different paths.&lt;br /&gt;&lt;br /&gt;Here are my numbers from the 15 servers:&lt;br /&gt;&lt;br /&gt; Total Files: 13753410&lt;br /&gt;Total Objects: 7037848&lt;br /&gt;File Savings: 6715562&lt;br /&gt;Percentage Savings: 48.8283414804&lt;br /&gt;&lt;br /&gt;Total File Size: 1747437157724&lt;br /&gt;Total Object Size: 1134826850745&lt;br /&gt;Size Savings: 612610306979&lt;br /&gt;Percentage Savings: 35.0576445208&lt;br /&gt;&lt;br /&gt;At 35% space savings seems pretty incredible to me, for as little I had to put into making it happen.  Some of the more aggressive data dedup alogoritms search files for long common strings, which certainly might improve the amount of savings, but at the cost of more CPU and IO processing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-4500326200235102688?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/4500326200235102688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=4500326200235102688&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/4500326200235102688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/4500326200235102688'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2009/04/initial-stats-from-15-servers.html' title='Initial Stats from 15 Servers'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-557041196140911479</id><published>2009-03-02T20:48:00.003-06:00</published><updated>2009-03-02T21:25:14.464-06:00</updated><title type='text'>Recursing into subdirectories</title><content type='html'>One of the things that needs to be done when backing up files is to determine what files exist on the server. I had written code to recurse into subdirectories, but it was very slow. It also basically only gave the filename, which necessitated a call to stat to get the important information. Then I discovered a system call ftw and nftw. These calls do the work of recursing into the subdirectories, and also provides the stat structure for each file. It is also very much faster than my original code. Here is the code I wrote to do that:&lt;br /&gt;&lt;br /&gt;&lt;pre class="prettyprint"&gt;&lt;br /&gt;#define _XOPEN_SOURCE 500&lt;br /&gt;#include &lt;stdio.h&gt;&lt;br /&gt;#include &lt;stdlib.h&gt;&lt;br /&gt;#include &lt;string.h&gt;&lt;br /&gt;#include &lt;stdint.h&gt;&lt;br /&gt;#include &lt;sys&gt;&lt;br /&gt;#include &lt;unistd.h&gt;&lt;br /&gt;#include &lt;sys&gt;&lt;br /&gt;#include &lt;ftw.h&gt;&lt;br /&gt;&lt;br /&gt;static int&lt;br /&gt;display_info(const char *fpath, const struct stat *sb,&lt;br /&gt;int tflag, struct FTW *ftwbuf)&lt;br /&gt;{&lt;br /&gt;printf("p:%s ",fpath);&lt;br /&gt;printf("tf:%d ",tflag);&lt;br /&gt;printf("dev:%lu ",sb-&gt;st_dev);&lt;br /&gt;printf("ino:%lu ",sb-&gt;st_ino);&lt;br /&gt;printf("m:%o ",sb-&gt;st_mode);&lt;br /&gt;printf("nl:%lu ",sb-&gt;st_nlink);&lt;br /&gt;printf("u:%d ",sb-&gt;st_uid);&lt;br /&gt;printf("g:%d ",sb-&gt;st_gid);&lt;br /&gt;printf("rd:%lx ",sb-&gt;st_rdev);&lt;br /&gt;printf("s:%lu ",sb-&gt;st_size);&lt;br /&gt;printf("bs:%lu ",sb-&gt;st_blksize);&lt;br /&gt;printf("b:%lu ",sb-&gt;st_blocks);&lt;br /&gt;printf("ta:%lu ",sb-&gt;st_atime);&lt;br /&gt;printf("tm:%lu ",sb-&gt;st_mtime);&lt;br /&gt;printf("tc:%lu ",sb-&gt;st_ctime);&lt;br /&gt;printf("\n");&lt;br /&gt;return 0; /* To tell nftw() to continue */&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int&lt;br /&gt;main(int argc, char *argv[])&lt;br /&gt;{&lt;br /&gt;int flags = FTW_MOUNT;&lt;br /&gt;&lt;br /&gt;if (argc &gt; 2 &amp;amp;&amp;amp; strchr(argv[2], 'd') != NULL)&lt;br /&gt;flags = FTW_DEPTH;&lt;br /&gt;if (argc &gt; 2 &amp;amp;&amp;amp; strchr(argv[2], 'p') != NULL)&lt;br /&gt;flags = FTW_PHYS;&lt;br /&gt;&lt;br /&gt;if (nftw64((argc &lt; 2) ? "." : argv[1], display_info, 80, flags) == -1) {&lt;br /&gt;perror("nftw");&lt;br /&gt;exit(EXIT_FAILURE);&lt;br /&gt;}&lt;br /&gt;exit(EXIT_SUCCESS);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-557041196140911479?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/557041196140911479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=557041196140911479&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/557041196140911479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/557041196140911479'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2009/03/recursing-into-subdirectories.html' title='Recursing into subdirectories'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-748222074452767289</id><published>2008-10-16T09:12:00.003-05:00</published><updated>2008-11-30T17:30:20.106-06:00</updated><title type='text'>Books that I Use</title><content type='html'>I thought it might be interesting to list the books I use as reference while programming for this project.  In no particular order:&lt;br /&gt;&lt;br /&gt;SQL in a Nutshell - Kevin E. Kline - This book was not what I thought it was, but it is a good reference for SQL and distinguishes between various flavors: DB2, MySQL, Oracle, and SQL Server mostly.&lt;br /&gt;&lt;br /&gt;High Performance MySQL - Jeremy D. Zawodny &amp;amp; Derek J. Balling - Good book on getting performance out of MySQL&lt;br /&gt;&lt;br /&gt;Advanced Programming in the Unix Environment - W. Richard Stevens&lt;br /&gt;&lt;br /&gt;Unix Network Programming - W. Richard Stevens&lt;br /&gt;&lt;br /&gt;The C Programming Language - Kernighan and Richie&lt;br /&gt;&lt;br /&gt;Programming with Posix Threads - &lt;span class="ptBrand"&gt;David R. Butenhof&lt;/span&gt;&lt;span class="binding"&gt;&lt;br /&gt;&lt;br /&gt;I'm sure there's more, but that's all I can think of right now.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-748222074452767289?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/748222074452767289/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=748222074452767289&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/748222074452767289'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/748222074452767289'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2008/10/books-that-i-use.html' title='Books that I Use'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2493250806093724011.post-522288327798706443</id><published>2008-09-30T13:29:00.003-05:00</published><updated>2008-09-30T13:36:17.635-05:00</updated><title type='text'>My Thoughts on Creating a Backup Storage System</title><content type='html'>I have worked with backups and network backup systems for about 13 years now.  I decided to investigate what it would take to write such a system, using MySQL for the backend database.&lt;br /&gt;&lt;br /&gt;  The first thing I discovered is that traversing the directories is a slow process, though many times faster than actually backing up the files, but still slow.  I discovered a nifty subroutine called nftw that recurses into directories and gives a stat structure and other information for each file found.  Much, much faster, and you get the stat structure to boot.&lt;br /&gt;&lt;br /&gt;The other thing I have been investigating is a poor man's data de-duplication.  I've run some experiments on servers here at work, creating a hash of each file.  After looking at millions of files, I settled on MD5 hash, with no duplicate hashes found so far.&lt;br /&gt;&lt;br /&gt;I will post  later about the code to recurse into directories, and the hash stuff and some timings and why I chose MD5.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2493250806093724011-522288327798706443?l=agcbsm.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://agcbsm.blogspot.com/feeds/522288327798706443/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=2493250806093724011&amp;postID=522288327798706443&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/522288327798706443'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2493250806093724011/posts/default/522288327798706443'/><link rel='alternate' type='text/html' href='http://agcbsm.blogspot.com/2008/09/my-thoughts-on-creating-backup-storage.html' title='My Thoughts on Creating a Backup Storage System'/><author><name>naclosagc</name><uri>http://www.blogger.com/profile/03306870871009241032</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
