Brief Description of tool
IBM word cloud generator is easy desktop software dedicated to create word cloud.
A word cloud is a visual representation for text data in which the importance of each word is shown with font size or color depending upon the frequency of that particular word. Since the software is not available in web so alternative equivalent software Tagxedo is been used.
Description of Data :
Getting Data directly from Web
Getting Data directly from Web
The data which is being used can be either URL of any web page, a Twitter ID or can be given in format of text document.
Using Features
Taking data from VGSOM website and Yantrajaal blog following word clouds were created.
Business value
· It can be used to analyze contrast the speech of famous personalities.
· Making logo of company as well as in advertising. Example: Unilever.
· Critique Resume. What would be highlighted in front of recruiters
· Analyze search Keywords and apply learning in optimization of website.
· Summary of conference session.
Drawback
· The software is not available in web so an alternative equivalent software is been used.
· IBM word Cloud generator ignores word such as “a” and “ the” and there is no difference between words such as “it” and “IT”.
Needlebase
Introduction
Needlebase is a platform for acquiring, integrating, cleansing, analyzing and publishing data on the web. It helps in:-
- Acquiring data from multiple sources
- Merge, remove duplicates and cleanse: merges, edits and deletions persist even after the original data is refreshed from its source.
- —Build and publish custom data views: list, table, grid, or map.
Needlebase dramatically reduces the time, cost, and expertise needed to build and maintain comprehensive databases of practically anything.
Capabilities
- Data Acquisition
- Data Integration
- Data Publishing
- Feature Summary
Needlebase Data Acquisition Features
- Import data from complex websites via a simple data-tagging interface.
- — No knowledge of programming, scripting, HTML DOM structure or regular expressions is required.
- — Imports data from XML, CSV, and Excel formatted files.
- — Supports bulk data upload
- — Normalizes common data types including dates, times, names, titles, numbers, URLs, phone numbers, and prices
Tutorial
Add a new data source
http://www.fundoodata.com/
Once we have trained Needlebase on a website or other data source, Needlebase will be able to collect data automatically from that source thereafter. The right-hand side of the screen is the Needlebase tagging panel, and the left-hand side shows the source page
Getting Started
- Needlebase is a web based tool so it is required to login
- Visit http://www.needlebase.com in your browser and login with registered email-id.
Create a new database
Enter a new database topic and an optional description.
Add a new data source
We are collecting data of consulting companies in a particular geography from fundoodata website that publishes companies data. Enter the following URL into the text box and click start button.
http://www.fundoodata.com/
This takes us to the data tagging screen, where we will train Needlebase to collect data from the data source
Tag the start page
Once we have trained Needlebase on a website or other data source, Needlebase will be able to collect data automatically from that source thereafter. The right-hand side of the screen is the Needlebase tagging panel, and the left-hand side shows the source page
Green form field tag and form submit button tag are tagged in the source field.
Tag the search results pages
In Page Group 2 click link to follow and tag first three companies link starting from the top. Needlebase can guess at a pattern based on the sample tags placed, and do the rest of tagging for us.
Tag the details pages
In Page Group 3 we can see Address , City, Phone. no, website etc of the searched companies. Creating new relevant fields in the panel side and there by Tagging all the required fields in the source side is required here. Click the done button at the top right of the page to finish tagging and go back to the data sources list.
Collect data from the website
Click the collect now link
When we click collect now, the pane will immediately display "Collecting (queued)" and then Collecting now" with an indication of how many pages the system has traversed
Eventually, the system will complete its data collection run. When complete, the display will look like: "collected (date and time)", and the Nodes Created list will show the total number of nodes collected along with subtotals for each of the types. We collected 42 companies.
Needlebase Data Integration:
Multi-source data—whether from websites, feeds, or private uploads—is inevitably riddled with redundant, incomplete, or mutually contradictory information.
Needlebase helps by:
- automatically mapping data from all sources into one consistent data mode
- automatically merging data items that agree on key properties (e.g., companies that share the same name and address)
- automatically identifying clusters of similar items and proposing them as candidates for manual merging
Map View
This tool can be effectively used for acquiring company details ( name, category, board number, address etc) from websites like www.fundoodata.com, www.justdial.com for placement cell and sponsorship teams
This tool can be used to view reports as desired by the user.
Below mentioned is the detailed Tutorial for easy guidance and working with this tool
https://my.needlebase.com/docs/NeedleTutorialDomainCreation3.html
Presentation: http://prezi.com/9ervpzaeorkj/tagxedo-needlebase-vgsom/













