Table of Contents


by Vladimir Batagelj, University of Ljubljana

The structure of the challenge data file is partially different from the structure of the files the user can save from the Web of Science (groups of articles instead of sequence of single articles; the citations are provided as subrecords instead of “ISI names”). For these reasons we had to substantially adapt our program WoS2Pajek
to fit the challenge data. The adapted version WoS2Pajek/MDTS is available at:
WoS2Pajek needs for lemmatising of keywords (can be obtained selectively from fields DE, ID, TI and AB) the library MontyLingua
Using the publication year partition it is possible to extract time slices from networks or transform them into temporal networks using in Pajek

Operations/Transform/Add/Time Intervals determined by Partitions

Because of lack of time the current version of WoS2Pajek/MDTS hasn't the tkinter GUI yet. We run it from IDLE editor window using the Run module option. Here is an extract from the trace of its execution:

::: WoS2Pajek/MDTS 0.2 
by V. Batagelj, March 19, 2011/March 12, 2011
   based on Wos2Pajek 0.8

8 arguments required to run !
WoS directory          = ''
MontyLingua directory  = ''
project subdirectory   = 'second'
WoS file               = 'mdts11.WoS'
max num of vertices    = 2000000
ISInumber (True/False) = False
makeClean (True/False) = False
list step              = 1000
keywords from          = [True, True, False, False]
WoS2Pajek parameters
WoS  dir:  D:\data\embryo
ML   dir:  c:\Python25\Lib\site-packages\MontyLingua-2.1\Python
Proj dir:  second
WoS file:  mdts11.WoS
MaxNum  :  2000000
step    :  1000
ISI name:  False
clean   :  False
keywords:  [True, True, False, False]
>>> End of processing of WoS file
number of works        =  1942821
number of authors      =  519595
number of journals     =  174720
number of keywords     =  71966
number of institutions =  34718
number of countries    =  290
number of records      =  198016
number of duplicates   =  21870

*** FILES:
year of publication partition: .\second\Year.clu
language partition: .\second\Lang.clu
described / cited only partition: .\second\DC.clu
number of pages vector: .\second\NP.vec
citation network: .\second\
works X journals network: .\second\
works X keywords network: .\second\
works X countries network: .\second\
works X institutions network: .\second\
works X authors  network: .\second\

Currently it extracts 3 partitions, a vector, citation network and 4 two-mode networks. Since the networks are compatible - at least one set is the set of all articles - we can derive, using network multiplication, many additional networks - for example: collaboration network AW*WA, number of citations between authors AW*Cite*WA, etc. See the paper
Kejžar, N., Korenjak Černe, S., Batagelj, V.: Network Analysis of Works on Clustering and Classification from Web of Science. Classification as a Tool for Research. Hermann Locarek-Junge, Claus Weihs eds. Proceedings of IFCS 2009. Studies in Classification, Data Analysis, and Knowledge Organization, Part 3, 525-536, Springer, Berlin, 2010. preprint
for possible analyses of the obtained data sets.

The obtained Pajek files (March 20, 2011) are available for download at:

To do

  1. There is an irregularity in the raw data file obtained from WoS - there is no end-of-line separator between the data header and the first data line. The temporary solution is “hard wired” in the line 302
    lines = line = “UI 0002817560\n”
  2. support for PT, CP
  3. add the tkinter GUI interface


  • March 12, 2011 - WoS2Pajek 0.8 adapted for MDTS file format
  • March 19, 2011 - WoS2Pajek/MDTS - support for fields LA, NC and NU added
  • March 20, 2011 - Pajek's files produced from the challenge data
wos2pajek_mdts.txt · Last modified: 2011/03/23 17:36 by vlado
