by Vladimir Batagelj, University of Ljubljana
The structure of the challenge data file is partially different from
the structure of the files the user can save from the Web of Science
(groups of articles instead of sequence of single articles; the
citations are provided as subrecords instead of “ISI names”). For these
reasons we had to substantially adapt our program WoS2Pajek
to fit the challenge data. The adapted version WoS2Pajek/MDTS is available at:
WoS2Pajek needs for lemmatising of keywords (can be obtained selectively from fields DE, ID, TI and AB) the library MontyLingua
Using the publication year partition it is possible to extract time slices from networks or transform them into temporal networks using in Pajek
Operations/Transform/Add/Time Intervals determined by Partitions
Because of lack of time the current version of WoS2Pajek/MDTS hasn't the tkinter GUI yet. We run it from IDLE editor window using the Run module option. Here is an extract from the trace of its execution:
::: WoS2Pajek/MDTS 0.2 by V. Batagelj, March 19, 2011/March 12, 2011 based on Wos2Pajek 0.8 8 arguments required to run ! WoS directory = '' MontyLingua directory = '' project subdirectory = 'second' WoS file = 'mdts11.WoS' max num of vertices = 2000000 ISInumber (True/False) = False makeClean (True/False) = False list step = 1000 keywords from = [True, True, False, False] WoS2Pajek parameters WoS dir: D:\data\embryo ML dir: c:\Python25\Lib\site-packages\MontyLingua-2.1\Python Proj dir: second WoS file: mdts11.WoS MaxNum : 2000000 step : 1000 ISI name: False clean : False keywords: [True, True, False, False] >>> End of processing of WoS file number of works = 1942821 number of authors = 519595 number of journals = 174720 number of keywords = 71966 number of institutions = 34718 number of countries = 290 number of records = 198016 number of duplicates = 21870 *** FILES: year of publication partition: .\second\Year.clu language partition: .\second\Lang.clu described / cited only partition: .\second\DC.clu number of pages vector: .\second\NP.vec citation network: .\second\Cite.net works X journals network: .\second\WJ.net works X keywords network: .\second\WK.net works X countries network: .\second\WC.net works X institutions network: .\second\WI.net works X authors network: .\second\WA.net
Currently it extracts 3 partitions, a vector, citation network
and 4 two-mode networks.
Since the networks are compatible - at least one set is the set
of all articles - we can derive, using network multiplication,
many additional networks - for example: collaboration network
AW*WA, number of citations between authors AW*Cite*WA, etc.
See the paper
Kejžar, N., Korenjak Černe, S., Batagelj, V.: Network Analysis of Works on Clustering and Classification from Web of Science. Classification as a Tool for Research. Hermann Locarek-Junge, Claus Weihs eds. Proceedings of IFCS 2009. Studies in Classification, Data Analysis, and Knowledge Organization, Part 3, 525-536, Springer, Berlin, 2010. preprint
for possible analyses of the obtained data sets.
The obtained Pajek files (March 20, 2011) are available for
lines = line = “UI 0002817560\n”