A Method For Discovering And Downloading Hidden Web Content

UC Case No. 2004-656

SUMMARY

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user.

BACKGROUND

Current internet search engines are limited in their ability to search through web-based databases that are only accessible by directly querying them. Typical search engines like Google use a crawling system to search web content. This means that the search engine recursively explores webpage links to discover more links from subsequent webpages until a condition is met. Although other search engines and algorithms can query these databases, they do not actually download the database content that the end user wants. The ability to download this hidden web content can have high value for companies like personalized web searching, creation of web directories, integration of hidden content in e-commerce stores, and providing key information to make business decisions.

INNOVATION

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user. They have designed a system for searching the internet that is able to interact with web-based databases by automatically generating queries for their search pages. Compared to other web-based databases searches, this invention will be able to download the content of the databases as opposed to just getting a summary of the databases or identifying their structure. It can also use its previous queries or searches to generate a query that will return the most relevant content.

POTENTIAL APPLICATIONS

Internet information searching

ADVANTAGES

Automatic generation of queries for web databases with no human interaction
Generated queries are more effective in discovering content within the web databases
Robust system design can handle partial list returns from the web databases
Maximizes amount of downloadable web content and uses resources efficiently

Download as PDF

For More Information:

Joel Kehle

Business Development Officer

joel.kehle@tdg.ucla.edu

Inventors:

Alexandros Ntoulas

Petros Zerfos

Junghoo Cho

Categories:

Software & Algorithms

Software & Algorithms > Artificial Intelligence & Machine Learning