Method and System for Pick-And-Drop Sampling from Large Dataset
SUMMARY
UCLA researchers in the Department of Computer Science have developed a new algorithm that approximates large frequency moments in big datasets with pick-and-drop sampling for analysis.
BACKGROUND
With increasing data volume, the ability to analyze the data becomes challenging. In some cases, the data is generated by a single event and stored for analysis, e.g. large simulations (financial or scientific). In other instances, the data is generated by singular simultaneous events, such as daily sales data from online purchases/retailers. While each day's data may be efficiently analyzed, the size the combined data is likely too big for practical in-depth analysis. Approximate frequency moments could be used to analyze retailers weekly or yearly sales figures when analysis of the data becomes impractically large to handle with conventional analysis.
INNOVATION
UCLA researcher Rafail Ostrovsky has developed an algorithm to estimate higher frequency moments of a given data stream. The algorithm provides useful statistics on the data set when the incoming data is too big to store or efficiently analyze.
ADVANTAGES
- Provide analysis and robust statistics for very large and continuous data streams (e.g. online sales, commercial sales, big data science)
STATE OF DEVELOPMENT
Researchers have created and validated the algorithm.
RELATED MATERIALS
V. Braverman and R. Ostrovsky, Approximating Large Frequency Moments with Pick-and-Drop Sampling, in Approximation, Randomization, and Combinatorial Optimization, 2013.