SE250:Poster, Experiment Kit - mgar059, nirv002

From Marks Wiki
Jump to navigation Jump to search

Intranet Site!!!

This is where most of the work is going. It only works properly with IE, if you know how to fix it for firefox, let me know on the discussion :-)


More Information on Hashing


The Experiment Kit for hashing may be found here!



This page deals with Hashing

This is the outline of the content for our poster. This information is repeated on our Intranet Site. Sadly this is only accessable within the university network.


Poster

File:SE250 Poster hashing.jpg

Hashing- what is it?

Hashing is the process of reducing a data input to a shorter unique identifier. The advantage of using an identifier is that it can speed up the process of finding data. For example, it is much easier to compare identifier strings of 4 digits for equality, than the whole n digit string that it represents. An example of this is license plate numbers. License plate numbers can be used to represent a range of the cars details, including the type of car, its colour, year it was made etc. It is easier to represent all of this in a unique number rather than use a full description. Hashing also has other uses such as verifying large amounts of data quickly, detecting similarity between chunks of data, and hiding information in cryptographic applications.

http://en.wikipedia.org/wiki/Hashing

http://www.cs.duke.edu/~mlittman/courses/Archive/cps130-97/lectures/lect11/node6.html

http://www.cs.sunysb.edu/~skiena/214/lectures/lect21/lect21.html

Uses of hashing

Hashing is used to speed up data search, verify data, detect similarities in data, and to compare data while retaining privacy

http://www.techworld.com/storage/features/index.cfm?FeatureID=235

http://en.wikipedia.org/wiki/Hashing

http://www.answers.com/topic/hash-function?method=22

Hash Tables

Hash table searching is the most powerful method of retrieving data known.

Hash tables are a way of using hash values as indexes to search quickly. In a perfect hash function, where each input has a unique hash, data can be accessed with the first probe into the table. This takes O(1) runtime. Poor hash functions which produce many collisions can degrade the search to a linear operation. Good functions for this are the LOOKUP2 and buzhash functions.

http://www.comp.nus.edu.sg/~stevenha/programming/prog_searching.html


Verifying data

Hashes are very useful for comparing data. Two pieces of data can be compared by simply comparing their hash values. In a good hash function, a small change in the data will be reflected in a vastly different hash value. Costly hash functions which produce unique stings are usually used for these tasks. This is useful when sending data across a network, to check if the data was corrupted during transfer.

http://www.answers.com/topic/hash-function?method=22

Detecting similarity

For some cases, it is useful to have a less sensitive hash function for detecting similarity between data. One use is with music where you want to detect if two MP3 files are the same song. It is possible to have a function which can detect similar music even though they may have been recorded in either a studio or noisy room, or be of slightly different length.

http://citeseer.ist.psu.edu/haitsma01robust.html

Hashing in cryptography

Hash functions are very useful in cryptography. The hash function is used to obscure passwords and private data. This allows the hash values of two passwords to be compared for equality, removing the need to transmit the password in plain format, where it could be intercepted. It is very important with cryptographic functions to reduce collisions and eliminate the ability to derive the original information from the hash value.

http://www.answers.com/topic/hash-function?method=22


Problems with hashing

One of the main problems with hash functions is collisions. Collisions are where two distinct inputs give the same output.

A major problem when using hash functions is collisions. Hash functions attempt to associate complicated data with a more simple unique id for that data. A hash collision is where two different pieces of data are hashed to produce the same hash.

Hashes are often used to speed up the searching of data. By using a hash table, on average, data can be found at a constant time O(1) and at a rare worst case of O(n). When sorting data into a hash table, data is not stored in any specific order. As such, accessing data in a sorted manner can be extremely complex and in cases impossible within a reasonable time.

Hash functions which are good at providing unique hashes with randomly distributed output can be costly in terms of processing power and time to compute. Clustering occurs when a hash function returns values closely spaced within the output range. This can cause significant performance degradation in hash tables.

http://www.answers.com/hash%20table

http://www.answers.com/main/ntquery?s=hash+tables&gwp=13


Hashing process

There will be Diagrams

In hashing, a piece of data is run through a hashing function to give a shorter string of alphanumeric digits. This is often done through a combination of rotational and logic operations on the binary bits that make up the input. If one small bit of the data going into the function changes, the output of the function should also change (ideally, one digit changing in the input should cause at least half of the output digits to change). The idea is that each unique input will produce a corresponding unique hash. This gives a way of comparing large amounts of data with only a small string.

http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=MD5&curtab=2222_1

Experiment Kit

The Experiment Kit for hashing may be found here!


References

A store of our resources are located within: http://studwww.cs.auckland.ac.nz/~mgar059.

Overview

http://www.cs.duke.edu/~mlittman/courses/Archive/cps130-97/lectures/lect11/node6.html

http://www.cs.sunysb.edu/~skiena/214/lectures/lect21/lect21.html

Wiki/other: Specific points on hashing

http://www.answers.com/topic/hash-function?method=22

http://www.answers.com/main/ntquery?s=hash+tables&gwp=13 <-- Clustering

http://www.answers.com/main/ntquery?method=4&dsid=2222&dekey=MD5&curtab=2222_1


Buzhash: Authors page

http://www.serve.net/buz/hash.adt/java.000.html

Audio identification:

http://citeseer.ist.psu.edu/haitsma01robust.html

pdf file is in resources folder

Jamie's hashing presentation.


Misc

http://www.comp.nus.edu.sg/~stevenha/programming/prog_searching.html


Pictures

www.marioparga.com/hobbies5.html

www.drinvestigations.com