Thursday, 16 February 2012

Word Cloud Generator & Needlebase [10BM60086] Siddharth Verma & [10BM60092] Swarnabha Shankar Ray


Word Cloud Generator 

Brief Description of tool

IBM word cloud generator is easy desktop software dedicated to create word cloud.
 A word cloud is a visual representation for text data in which the importance of each word is shown with font size or color depending upon the frequency of that particular word. Since the software is not available in web so alternative equivalent software Tagxedo is been used.



Description of Data :
Getting Data directly from Web


The data which is being used can be either URL of any web page, a Twitter ID or can be given in format of text document.

Using Features



Taking data from VGSOM website and Yantrajaal blog following word clouds were created.




Business value
·         It can be used to analyze contrast the speech of famous personalities.
·         Making logo of company as well as in advertising. Example: Unilever.
·         Critique Resume. What would be highlighted in front of recruiters
·         Analyze search Keywords and apply learning in optimization of website.
·         Summary of conference session.

Drawback
·         The software is not available in web so an alternative equivalent software is been used.

·         IBM word Cloud generator ignores word such as “a” and “ the” and there is no difference between words such as “it” and “IT”.


Needlebase

Introduction

 Needlebase is a platform for acquiring, integrating, cleansing, analyzing and publishing data on the webIt helps in:-   
  •      Acquiring data from multiple sources 
  •      Merge, remove duplicates and cleanse: merges, edits and deletions persist even after the original data is refreshed from its source.
  •       —Build and publish custom data views: list, table, grid, or map
Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything.

Capabilities

  •      Data Acquisition
  •      Data Integration
  •      Data Publishing
  •      Feature Summary

Needlebase Data Acquisition Features

  •      Import data from complex websites via a simple data-tagging interface. 
  •      — No knowledge of programming, scripting, HTML DOM structure or regular   expressions is required.
  • —      Imports data from XML, CSV, and Excel formatted files.
  • —      Supports bulk data upload
  • —      Normalizes common data types including dates, times, names, titles, numbers, URLs, phone numbers, and prices
Tutorial

Getting Started

  • Needlebase is a web based tool so it is required to login
  • Visit http://www.needlebase.com in your browser and login with registered email-id.

Create a new database

 Enter a new database topic and an optional description.

 Add a new data source
We are collecting data of consulting companies in a particular geography from fundoodata website that publishes companies data. Enter the following URL into the text box and click start button.

                                        http://www.fundoodata.com/





This takes us to the data tagging screen, where we will train Needlebase to collect data from the data source

Tag the start page

      Once we have trained Needlebase on a website or other data source, Needlebase will be able to collect data automatically from that source thereafter.  The right-hand side of the screen is the Needlebase tagging panel, and the left-hand side shows the source page

     Green form field tag and form submit button tag are tagged in the source field.

    Tag the search results pages

           In Page Group 2 click link to follow and tag first three companies link starting from the top. Needlebase can guess at a pattern based on the sample tags placed, and do the rest of tagging for us.

    Tag the details pages

           In Page Group 3 we can see Address , City, Phone. no, website etc of the searched companies. Creating new relevant fields in the panel side and there by Tagging all the required fields in the source side is required here. Click the done button at the top right of the page to finish tagging and go back to the data sources list.

    Collect data from the website

        Click the collect now link 


           When we click collect now, the pane will immediately display "Collecting (queued)" and then Collecting now" with an indication of how many pages the system has traversed
    Eventually, the system will complete its data collection run. When complete, the display will look like: "collected (date and time)", and the Nodes Created list will show the total number of nodes collected along with subtotals for each of the types. We collected 42 companies.

    Needlebase Data Integration:
    Multi-source data—whether from websites, feeds, or private uploads—is inevitably riddled with redundant, incomplete, or mutually contradictory information. 
    Needlebase helps by:
    •      automatically mapping data from all sources into one consistent data mode
    •      automatically merging data items that agree on key properties (e.g., companies that share the same name and address)
    •      automatically identifying clusters of similar items and proposing them as candidates for manual merging


         Data Publishing Features
         Data navigation and querying allows easy browsing of interconnected data.Renders data as attractive tables, groups, grids, lists, and Google maps
       
             Table View                                             


                 List View


            Map View

           This tool can be effectively used for acquiring company details ( name, category, board number, address etc) from websites like  www.fundoodata.com,  www.justdial.com  for placement cell and sponsorship teams
           This tool can be used to view reports as desired by the user.
           Below mentioned is the detailed Tutorial for easy guidance and working with this tool
          https://my.needlebase.com/docs/NeedleTutorialDomainCreation3.html

    No comments:

    Post a Comment