




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、COMP2311COMP231File and Index StructureCOMP2312OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP2313Disks and FilesDBMS stores information on (“hard”) disks.A disk is a sequence of bytes, each has a disk address.READ: transfer data from disk to main memory (RAM).WRITE: transfer data from RAM
2、 to disk.Data are stored and retrieved in units called disk blocks or pages.Each page has a fixed size, say 512 bytes. It contains a sequence of records.In many (not always) cases, records in a page have the same size, say 100 bytes.COMP2314Why Not Store Everything in Main Memory?Costs too much. RAM
3、 is much more expensive than disk.Main memory is volatile. We want data to be saved between runs. Typical storage hierarchy:Main memory (RAM) for currently used data.Disk for the main database (secondary storage).Tapes for archiving older versions of the data (tertiary storage).COMP2315Components of
4、 a DiskPlattersSpindleDisk headArm movementArm assemblyTracksSectorBlock size is a multiple of sector size (which is fixed).COMP2316File OrganizationDatabase is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields.one file one tableone record one tupl
5、ea record/tuple has a fixed lengthEasy to implement but limited by the file systemCOMP2317Fixed-Length RecordsSimple approach:store record i starting from byte n(i - 1), where n is the size of each record.Record access is simple but records may cross disk blocks.When record i is deleted, how do you
6、handle the released space?Shifting records:move records i+1,n to i, n-1move record n to ilink all free records on a free listiCOMP2318Sequential File OrganizationSuitable for application that require sequential processing of the entire file The records in the file are ordered by a search-keyCOMP2319
7、Sequential File Organization (cont.)Deletion use pointer chainsInsertion must locate the position in the file where the record is to be recordif there is free space insert thereif no free space, insert the record in an overflow blockIn either case, pointer chain must be updatedNeed to reorganize the
8、 file from time to time to restore sequential orderCOMP23110Clustering File OrganizationGood for queries involving depositor customer, and for queries involving one single customer and his accountsBad for queries involving only customerResult in variable size recordsSimple file structure stores each
9、 relation in a separate file can instead store several relations in one file using a clustering file organizationE.g., clustering organization of customer and depositor.1 clustercustomerdepositorCOMP23111OutlineDisks and FilesIndexes and DatabasesHash FilesHierarchy structure12Indexes and DatabasesA
10、 table(conceptual)文件綱要Index 1(Ordered indices, B-tree, hash)Index 2(Ordered indices, B-tree, hash)Records are physicallystored in a hash tablebased on a selected keyCOMP23113Basic ConceptsIndexing mechanisms speed up access to desired dataE.g. If you know the call number of a book, you can go direct
11、ly to the book shelve in the library; otherwise you have to search through all the book shelvesSearch key attributes used to look up records in a file.An index file consists of records (called index entries) of the format:關(guān)鍵字 物理內(nèi)存COMP23114Basic ConceptsIndex files are typically much smaller than the
12、 original file Two basic kinds of indices:Ordered indices: search keys are stored in sorted orderHash indices: search keys are distributed uniformly across “buckets” using a “hash function”.COMP23115Index Evaluation MetricsHow are indexing techniques evaluated?What access operations are supported ef
13、ficiently? E.g.,records with a specified value in an attribute (equality queries)or records with an attribute value falling in a specified range of values (range queries)Access timeInsertion timeDeletion timeSpace overhead (size of indexes / size of data records)COMP23116Ordered IndicesIn an ordered
14、 index, index entries are stored based on the search key value. E.g., author catalog in library.Indexes are mostly ordered indexes except those based on hash filesCOMP23117Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Also called
15、 clustering indexThe search key of a primary index is usually but not necessarily the primary key.Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.COMP23118Index Structure Design: Dense Index FilesEvery se
16、arch-key value in the data file is indexed.Data recordsDense indexCOMP23119Sparse Index FilesNot all of the search-key values are indexedAdv: reduce index size Disadv: slowerRecords in table are sorted by primary key valuesSearch is done in two steps: (i) find the largest possible key (ii) search th
17、e record forwardCOMP23120Example of Sparse Index FilesTo locate a record with search-key value K we:find index record with largest search-key value Ksearch file sequentially starting at the record to which index record pointsLess space and less maintenance overhead for insertion and deletions.Genera
18、lly slower than dense index for locating records.Good Tradeoff: sparse index with an index entry for every block in file. The disk must read a block of data into main memory anyway, so searching within the block cost little.COMP23121Multilevel IndexIf primary index does not fit in memory, access bec
19、omes expensive. (Why?)To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it.Outer index a sparse index of primary indexInner index the primary index fileIf even outer index is too large to fit in main memory, yet
20、another level of index can be created, and so on (multi-level indices).Indices at all levels must be updated on insertion or deletion from the file.COMP23122Multilevel Index (Cont.)IndexBlock 0IndexBlock 1Inner indexouter indexDatablock 0Datablock 1Datablock 2Datablock 3COMP23123Index Update: Deleti
21、onIf deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.Single-level index deletion:Dense indices similar to file record deletionCOMP23124Index Update: Deletion in Sparse IndicesSparse indices if an entry for the search
22、key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.DowntownCOMP23125Index Update: InsertionSingle-level in
23、dex insertion:Perform a lookup using the search-key value appearing in the record to be inserted.Dense indices - if the search-key value does not appear in the index, insert it.Sparse indices - if index stores an entry for each block of the file, no change needs to be made to the index unless a new
24、block is created. In this case, the first search-key value appearing in the new block is inserted into the index.Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms.COMP23126Secondary IndicesYou can organize a file (say, into indexed sequential
25、file) based on one attribute only (e.g., emp#), but it is not adequate in practiceFrequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.Example 1: if the account database stored sequentially by account
26、number, we may want to find all accounts in a particular branch.Example 2: as above, but where we want to find all accounts with a specified balance or range of balances.We can have a secondary index with an index record for each search-key value: an index record points to a bucket that contains poi
27、nters to all the actual records with that particular search-key value.COMP23127Secondary Index on Balance Field of AccountNote: the file is already indexed basedon branch-nameSecondary IndexCOMP23128Primary and Secondary IndicesSecondary indices have to be dense because records are not sorted by the
28、 secondary index valuesIndices offer substantial benefits when searching for records, always select some attributes to index and re-examine the selection periodicallyWhen a file is modified, every index on the file must be updated. Updating indices imposes overhead on database modificationSequential
29、 scan using primary index is efficient, but a sequential scan using a secondary index is expensive (each record access may fetch a new block from disk.)COMP23129OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP23130Hash FilesIn previous data structure course, you may learn hash table, which
30、is main-memory residentA hash method includes a hash function and a collision handling mechanismIn main memory, the unit of access is based on byte or word sizesIn disk-based hash file, the unit of access is based on disk block size (from 512 to 2048 bytes, depending on OS)COMP23131Hash FilesStatic
31、HashingDynamic HashingCOMP23132Static External HashingBucket 0Bucket iA hash file consists of M buckets of the same size: bucket0, bucket1,. bucketM-l Collisions occur when a new record hashes to a bucket that is already full. A new bucket is created and chained to the overflowed bucketTo reduce ove
32、rflow records, a hash file is typically kept 70-80% fullRecord to beinserted withkey Ki = h(K)Hashfunctionh()OverflowbucketCOMP23133Static External Hashing PropertiesThe hash function h() should distribute the records uniformly among the buckets; otherwise, search time will increase due to many over
33、flow records An ideal hash function will assign roughly the same number of records to each bucket irrespective of the actual distribution of search-key values in the fileThe number of buckets M must be fixed when the file is created; not good when the file size changes widelyAs a design criteria, yo
34、u need to determine the load factorCOMP23134Static External Hashing ExampleHash file organization of account file, using branch-name as key.An example of a hash function defined on a set of charaters in a file organization:There are 10 bucketsTake the ASCII code of each character as an integerSum up
35、 the binary representations of the characters and then applies modulo 10 to the sum to get the bucket numberCOMP23135Example of hash File OrganizationBucket 4Bucket 5Bucket 7Bucket 8COMP23136Hash IndicesHashing can be used not only for file organization, but also for index-structure creation. A hash
36、 index organizes the search keys, with their associated record pointers, into a hash file structure.Hash indices are always secondary indiceshash index(e.g., on salary)data file (e.g., hash or index sequential on emp#)COMP23137Example of hash IndexBucket 0Bucket 1Bucket 2Bucket 3Bucket 4Bucket 5Buck
37、et 6Primary indexUsed as asecondaryindexOverflowbucketHash Index on account-numberCOMP23138Hash FilesStatic HashingDynamic HashingCOMP23139Dynamic HashingGood for database that grows and shrinks in sizeAllows the hash function to be modified dynamicallyWhen the hash function takes modulo 10 in the p
38、revious example, the number of buckets is fixed to 10The trick is how to change the hash function so that the number of buckets can changeAnd at the same time, without the need of rehashing the existing records!Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehash
39、ed not a good ideaCOMP23140Extendable HashingExtendable hashing - one form of dynamic hashingHashing function generates values over a large range - typically b-bit integers, with b = 32.At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of
40、the prefix be i bits, 0 i 32Initially i = 1, meaning that it can index at most 2 bucketsWhen the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth.i grows and shrinks as the size of the database grows and shrinks.Actual number of b
41、uckets is ij (more than one pointer to bucket j)allocate a new bucket z, and set ij and iz to the old ij+1.make the second half of the bucket address table entries pointing to j to point to zremove and reinsert each record in bucket j.recompute new bucket for Kj and insert record in the bucket (furt
42、her splitting is required if the bucket is still full)1222could be any number 1COMP23150Split in Extendable hash StructureTo split a bucket j when inserting record with search-key value Kj;If i = ij (only one pointer to bucket j)increment i and double the size of the bucket address table.Replace eac
43、h entry in the table by two entries that point to the same bucket.Re-compute new bucket address table entry for Kj, now i ij, so use the first case above. i0=2i1=2i2=2i3=2new recordCOMP23151Example: Use of Extendable Hash StructureBranch-nameh(branch-name)Brighton0010 1101 1111 1011 0010 1100 0011 0
44、000Downtown1010 0011 1010 0000 1100 0110 1001 1111Mianus1100 0111 1110 1101 1011 1111 0011 1010Perryridge1111 0001 0010 0100 1001 0011 0110 1101Redwood 0011 0101 1010 0110 1100 1001 1110 1011Round hill1101 1000 0011 1111 1001 1100 0000 0001Initial Hash structure, Bucket size=2Bucket 0hash address ta
45、ble0COMP23152Insert:0010 1101 1111 1011 0010 1100 0011 0000no bit is needed from the hash value (i=0)Brighton, A-217, 750Brighton, A-217, 7500COMP23153Insert:1010 0011 1010 0000 1100 0110 1001 1111no bit is needed from the hash value (i=0)Downtown, A-101, 500Brighton, A-217, 7500Downtown, A-101, 500Insert:bucket full, split records according to first bit (i=1)Downtown, A-101, 600COMP2315411Brighton0010 1101 1111 1011 0010 1100 0011 0000Downtown1010 0011 1010 0000 1100 0110 1001 1111Brighton, A-217, 750Downtown, A-101, 500Downtown, A-101, 600Insert:1
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 茶園茶葉種植與茶葉種植基地基礎(chǔ)設(shè)施建設(shè)與運營承包經(jīng)營合同
- 文旅古鎮(zhèn)景區(qū)委托運營管理及旅游文創(chuàng)產(chǎn)品推廣合同
- 婚前商標權(quán)權(quán)屬確認及變更協(xié)議
- 湘少版英語四年級上冊課外活動計劃
- 2025年硫代硫酸鹽項目提案報告
- 中國煙草總公司合肥設(shè)計院筆試試題2024
- 詩歌的語言美:四年級古詩教學(xué)教案
- 電商行業(yè):智能營銷與訂單管理優(yōu)化方案
- 想象作文文具城的運動會350字(13篇)
- 2024-2025學(xué)年度九年級語文教學(xué)評估計劃
- 出納崗面試試題及答案
- 成人創(chuàng)傷性顱腦損傷院前與急診診治中國專家共識2025解讀
- 【公開課】+埃及+課件-2024-2025學(xué)年七年級地理下學(xué)期湘教版
- 北京開放大學(xué)2025年《企業(yè)統(tǒng)計》形考作業(yè)4答案
- 六下試卷計算題目及答案
- 廣東2025年中考模擬數(shù)學(xué)試卷試題及答案詳解
- GB/Z 27001-2025合格評定通用要素原則與要求
- 湖北省武漢市2025屆高中畢業(yè)生二月調(diào)研考試數(shù)學(xué)試題及答案
- 2025-2030中國屏蔽泵市場運行態(tài)勢分析及運營動態(tài)規(guī)劃研究報告
- 掛學(xué)籍協(xié)議書范本
- 2025年高考作文備考之十大熱點主題及寫作導(dǎo)引
評論
0/150
提交評論