A Method For Discovering And Downloading Hidden Web Content

UC Case No. 2004-656

 

SUMMARY

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user.

 

BACKGROUND

Current internet search engines are limited in their ability to search through web-based databases that are only accessible by directly querying them. Typical search engines like Google use a crawling system to search web content.  This means that the search engine recursively explores webpage links to discover more links from subsequent webpages until a condition is met.  Although other search engines and algorithms can query these databases, they do not actually download the database content that the end user wants.  The ability to download this hidden web content can have high value for companies like personalized web searching, creation of web directories, integration of hidden content in e-commerce stores, and providing key information to make business decisions.

 

INNOVATION

Researchers in the Computer Science Department at UCLA have developed a method for searching hidden web content that has previously been difficult to gather for the end user.  They have designed a system for searching the internet that is able to interact with web-based databases by automatically generating queries for their search pages.  Compared to other web-based databases searches, this invention will be able to download the content of the databases as opposed to just getting a summary of the databases or identifying their structure.  It can also use its previous queries or searches to generate a query that will return the most relevant content.

 

POTENTIAL APPLICATIONS

  • Internet information searching

 

ADVANTAGES

  • Automatic generation of queries for web databases with no human interaction
  • Generated queries are more effective in discovering content within the web databases
  • Robust system design can handle partial list returns from the web databases
  • Maximizes amount of downloadable web content and uses resources efficiently
Patent Information:
For More Information:
Joel Kehle
Business Development Officer
joel.kehle@tdg.ucla.edu
Inventors:
Alexandros Ntoulas
Petros Zerfos
Junghoo Cho