Skip to main content

Screen Scraping using Java

1. Here we have to use the basic package

com.gargoylesoftware.htmlunit;

2. How to get a page using com.gargoylesoftware.htmlunit.WebClient

WebClient webClient = new WebClient();
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setThrowExceptionOnScriptError(false);
webClient.setJavaScriptEnabled(false);
webClient.setTimeout(60000); // 1 minute
URL url = new URL("http://www.iberia.com/?language=en");
// Now you have the url into homepage
HtmlPage homepage = (HtmlPage)webClient.getPage(url);

3. taking a form from a page
// single form

HtmlForm frm1 = (HtmlForm) homepage .getFormByName("airPaxFormSimpl");

// multiple form

List listFormss = homepage.getForms();
HtmlForm frm1 = (HtmlForm)listFormss.get(5);


4. How to set or get value from a text box

HtmlTextInput p0name = (HtmlTextInput) frm1.getInputByName("passengers[0].name");
p0name.setValueAttribute("testA");
HtmlTextInput p0surname = (HtmlTextInput) frm1.getInputByName("passengers[0].surname1");
p0surname.setValueAttribute("sur");
HtmlTextInput p0surname2 = (HtmlTextInput) frm1.getInputByName("passengers[0].surname2");
p0surname2.setValueAttribute("sur");


5. How to set or get value from a check box


HtmlCheckBoxInput returnDirection = (HtmlCheckBoxInput) frm1.getInputByName("vehicle");
returnDirection.setChecked(true);

6.How to set or get value from a dropdown


HtmlSelect departCounty = (HtmlSelect)frm1.getSelectByName("model");
HtmlOption optDepartCountry = departCounty.getOptionByValue("volvo");
optDepartCountry.setSelected(true);


7.How to set or get value from a radiobutton

List list = frm1.getRadioButtonsByName("sex");
for ( int i = 0; i < list.size(); i++){
HtmlRadioButtonInput rbFlex = (HtmlRadioButtonInput) list.get(i);
if (rbFlex.getValueAttribute().equals(("male")) ){
rbFlex.setChecked(true);
break;
}
}
8.How to append a new component into existing form

HtmlHiddenInput sessID = (HtmlHiddenInput)frm1.getInputByName("BV_SessionID");
HtmlHiddenInput engnID = (HtmlHiddenInput)frm1.getInputByName("BV_EngineID");

System.out.println(sessID.getValueAttribute() + " " + engnID.getValueAttribute());

frm1.appendDomChild(sessID);
frm1.appendDomChild(engnID);

9.How to add a new component into existing form

Map hmap1 = new HashMap();
hmap1.put("id", "__EVENTTARGET");
hmap1.put("name", "__EVENTTARGET");
HtmlHiddenInput hid1 = new HtmlHiddenInput(pagePlan,hmap1);
frmPlan.appendDomChild(hid1);

Maphmap2 = new HashMap();
hmap2.put("id", "__EVENTARGUMENT");
hmap2.put("name", "__EVENTARGUMENT");
HtmlHiddenInput hid2 = new HtmlHiddenInput(pagePlan,hmap2);
frmPlan.appendDomChild(hid2);


10.
How to remove component from form

HtmlElement elem = frmPlan.getHtmlElementById("SearchBy");
elem.remove();
frmPlan.getHtmlElementById("SearchBy").remove();

Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partition     selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];     indexSen=1;indexSpe=1;indexAcc=1;     if ( kfold == (P+N) )% leave one out         sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )         specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )         acc

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = &q